PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE
Proceedings Volume 6806, including the Title Page, Copyright
information, Table of Contents, Introduction (if any), and the
Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Images have characteristic statistics that can be characterized in terms of the responses of wavelet or Gabor-like filters.
There has been a great deal of interest in the fact that images have sparse (kurtotic) statistics in the wavelet domain, with
implications for efficient image encoding in biological and artificial systems. If we set aside the issue of efficiency, we
are still left with the problem of seeing. We have been studying the ways in which filter statistics can reveal useful information
about surfaces, including albedo, shading, and gloss. We find that odd order statistics such as skewness are
quite useful in extracting information about reflectance and gloss, and we also find evidence that humans make use of
this information. It is straightforward to compute skewness with physiological mechanisms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Numerically modeling the interaction of light with materials is an essential step in generating realistic synthetic
images. While there have been many studies of how people perceive physical materials, very little work has been
done that facilitates efficient numerical modeling. Perceptual experiments and guidelines are needed for material
measurement, specification and rendering. For measurement, many devices and methods have been developed for
capturing spectral, directional and spatial variations of light/material interactions, but no guidelines exist for the
accuracy required. For specification, only very preliminary work has been done to find meaningful parameters for
users to search for and to select materials in software systems. For rendering, insight is needed on the perceptual
impact of material models when combined with global illumination methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Single photon detectors are regarded as a key enabling technology in a wide range of medical, industrial, and
military applications. However, the existing single photon detectors that can operate at or near room temperature have
poor efficiency and high noise. Interestingly, the counterparts of these devices in nature, namely the rod cells, have
amazingly high efficiency and low noise. In particular, the noise performance of the rod cells is five to six orders of
magnitude better than the semiconductor based single photon detectors at room temperature. At Bio-inspired Sensors and
Optoelectronics Laboratory, we explored the origin of such a high noise performance, and designed and implemented a
novel semiconductor device based on the underlying detection mechanism in the rod cells. Our device shows very
promising properties including orders of magnitude higher gain and lower noise compared with the existing devices.
More interestingly, the low operating voltage of the device combined with high gain uniformity should allow, for the
first time, realization of large imaging arrays with a high internal gain. Such imagers would open new opportunities for
novel applications such as quantum ghost imaging.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
What makes an image appear to be a veridical representation of a real scene? Knowing what is necessary to produce a
"good" image also aids in the design of more efficient compression algorithms. We review our earlier work on video
compression and demonstrate the substantial savings and excellent image quality produced by spatial low-pass filtering
of most (but not all) of the individual frames. Currently, we work with still images. An example will show that simple
filtering can produce unexpected changes in the perceptual interpretation of a complex scene. I will describe and
demonstrate a new compression method we are developing based on the assumption that the fine structure in the
amplitude domain (and perhaps in phase, as well) can be of minimal importance in conveying the essence of a scene. We
find that a complex image can be reproduced surprisingly well by compressing the entire spatial frequency amplitude
spectrum to a very small number of terms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we investigate the spatial correlational structure of orientation and color information in natural
images. We compare these with the spatial correlation structure of optical recordings of macaque monkey
primary visual cortex, in response to oriented and color stimuli. We show that the correlation of orientation falls
off rapidly over increasing distance. By using a color metric based on the a-b coordinates in the CIE-Lab color
space, we show that color information, on the other hand, is more highly correlated over larger distances. We
also show that orientation and color information are statistically independent in natural images. We perform
a similar spatial correlation analysis of the cortical responses to orientation and color. We observe a similar
behavior to that of natural images, in that the correlation of orientation-specific responses falls off; more rapidly
than the correlation of color-specific responses. Our findings suggest that: (a) orientation and color information
should be processed in separate channels, and (b) the organization of cortical color responses at a lower spatial
frequency compared to orientation is a reflection of the statistical structure of visual world.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The human brain has well over 30 cortical areas devoted to visual processing. Classical neuro-anatomical as well as
fMRI studies have demonstrated that early visual areas have a retinotopic organization whereby adjacent locations in
visual space are represented in adjacent areas of cortex within a visual area. At the 2006 Electronic Imaging meeting we
presented a method using sprite graphics to obtain high resolution retinotopic visual evoked potential responses using
multi-focal m-sequence technology (mfVEP). We have used this method to record mfVEPs from up to 192 non
overlapping checkerboard stimulus patches scaled such that each patch activates about 12 mm2 of cortex in area V1 and
even less in V2. This dense coverage enables us to incorporate cortical folding constraints, given by anatomical MRI
and fMRI results from the same subject, to isolate the V1 and V2 temporal responses. Moreover, the method offers a
simple means of validating the accuracy of the extracted V1 and V2 time functions by comparing the results between
left and right hemispheres that have unique folding patterns and are processed independently. Previous VEP studies
have been contradictory as to which area responds first to visual stimuli. This new method accurately separates the
signals from the two areas and demonstrates that both respond with essentially the same latency. A new method is
introduced which describes better ways to isolate cortical areas using an empirically determined forward model. The
method includes a novel steady state mfVEP and complex SVD techniques. In addition, this evolving technology is put
to use examining how stimulus attributes differentially impact the response in different cortical areas, in particular how
fast nonlinear contrast processing occurs. This question is examined using both state triggered kernel estimation (STKE)
and m-sequence "conditioned kernels". The analysis indicates different contrast gain control processes in areas V1 and
V2. Finally we show that our m-sequence multi-focal stimuli have advantages for integrating EEG and MEG for
improved dipole localization.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We look at a characterization of metaphor from cognitive linguistics, extracting the salient features of metaphorical processing. We examine the neurobiology of dendrites, specifically spike timing-dependent plasticity (STDP), and the modulation of backpropagating action potentials (bAPs), to generate a neuropil-centric model of cortical processing based on signal timing and reverberation between regions. We show how this model supports the basic features of metaphorical processing previously extracted. Finally, we model this system using a combination of euclidean, projective, and hyperbolic geometries, and show how the resulting model accounts for this processing, and relates to other neural network models
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Appearance in High Dynamic Range images is controlled by intraocular glare and physiological spatial contrast.
Increasing the number of high luminance pixels in a display increases glare and reduces the dynamic range of
luminances on the retina. Simultaneous contrast makes areas with higher glare related luminances look darker. Previous
experiments measured the range needed for the appearance black in surrounds with variable percentage of white pixels
in the background. In these test targets it was 2.0 log units with 100% white pixels, 2.3 log units with 50% white pixels,
2.9 log units with 8% white pixels, and 5.5 log units with 0% white pixels. We want to calculate the intensity of veiling
glare in these test scenes and relate retinal luminances to the magnitude estimates of appearance reported by observers.
This paper uses a glare spread function to calculate the retinal luminances after intraocular scatter. By modeling the
actual luminances on the retina we can compare them with appearance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many quality metrics take as input gamma corrected images and assume that pixel code values are scaled
perceptually uniform. Although this is a valid assumption for darker displays operating in the luminance range
typical for CRT displays (from 0.1 to 80 cd/m2), it is no longer true for much brighter LCD displays (typically
up to 500 cd/m2), plasma displays (small regions up to 1000 cd/m2) and HDR displays (up to 3000 cd/m2).
The distortions that are barely visible on dark displays become clearly noticeable when shown on much brighter
displays. To estimate quality of images shown on bright displays, we propose a straightforward extension to the
popular quality metrics, such as PSNR and SSIM, that makes them capable of handling all luminance levels
visible to the human eye without altering their results for typical CRT display luminance levels. Such extended
quality metrics can be used to estimate quality of high dynamic range (HDR) images as well as account for
display brightness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Contrast in image processing is typically scaled using a power function (gamma) where its exponent specifies the amount
of the physical contrast change. While the exponent is normally constant for the whole image, we observe that such scaling
leads to perceptual nonuniformity in the context of high dynamic range (HDR) images. This effect is mostly due to lower
contrast sensitivity of the human eyes for the low luminance levels. Such levels can be reproduced by an HDR display
while they can not be reproduced by standard display technology. We conduct two perceptual experiments on a complex
image: contrast scaling and contrast discrimination threshold, and we derive a model which relates changes of physical
and perceived contrasts at different luminance levels. We use the model to adjust the exponent value such that we obtain
better perceptual uniformity of global and local contrast scaling in complex images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The complexity of a polygonal mesh is usually reduced by applying a simplification method, resulting in a similar
mesh having less vertices and faces. Although several such methods have been developed, only a few observer
studies are reported comparing the perceived quality of the simplified meshes, and it is not yet clear how the
choice of a given method, and the level of simplification achieved, influence the quality of the resulting mesh, as
perceived by the final users. Similar issues occur regarding other mesh processing methods such as smoothing.
Mesh quality indices are the obvious less costly alternative to user studies, but it is also not clear how they relate
to perceived quality, and which indices best describe the users behavior.
This paper describes on going work concerning the evaluation of perceived quality of polygonal meshes using
observer studies, while looking for a quality index which estimates user performance. In particular, given some
results obtained in previous studies, a new experimental protocol was designed and a study involving 55 users
was carried out, which allowed their validation, as well as further insight regarding mesh quality, as perceived
by human observers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
How do human observers perceive visual complexity in images? This problem is especially relevant for computer graphics,
where a better understanding of visual complexity can aid in the development of more advanced rendering algorithms. In
this paper, we describe a study of the dimensionality of visual complexity in computer graphics scenes. We conducted
an experiment where subjects judged the relative complexity of 21 high-resolution scenes, rendered with photorealistic
methods. Scenes were gathered from web archives and varied in theme, number and layout of objects, material properties,
and lighting.
We analyzed the subject responses using multidimensional scaling of pooled subject responses. This analysis embedded
the stimulus images in a two-dimensional space, with axes that roughly corresponded to "numerosity" and "material /
lighting complexity". In a follow-up analysis, we derived a one-dimensional complexity ordering of the stimulus images.
We compared this ordering with several computable complexity metrics, such as scene polygon count and JPEG compression
size, and did not find them to be very correlated. Understanding the differences between these measures can lead to
the design of more efficient rendering algorithms in computer graphics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Ubiquitous computing (or Ambient Intelligence) promises a world in which information is available anytime anywhere
and with which humans can interact in a natural, multimodal way. In such world, perceptual image quality remains an
important criterion since most information will be displayed visually, but other criteria such as enjoyment, fun,
engagement and hedonic quality are emerging. This paper deals with engagement, the intrinsically enjoyable readiness to
put more effort into exploring and/or using a product than strictly required, thus attracting and keeping user's attention
for a longer period of time. The impact of the experienced richness of an interface, both visually and degree of possible
manipulations, was investigated in a series of experiments employing game-like user interfaces. This resulted in the
extension of an existing conceptual framework relating engagement to richness by means of two intermediating
variables, namely experienced challenge and sense of control. Predictions from this revised framework are evaluated
against results of an earlier experiment assessing the ergonomic and hedonic qualities of interactive media. Test material
consisted of interactive CD-ROM's containing presentations of three companies for future customers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In two experiments the effect of sound on visual information was investigated. In Experiment 1 the effect of the visual
appearance of product types with an expensive deign and with an inexpensive design on the experience of the sound
recordings of these products was investigated. Recordings and pictures were systematically interchanged. Thus, for
example, the visual image of an expensive design was combined with a recording of the sound of an inexpensive and of
an expensive design. It was found that product appearance did not affect the judgment on luxury, pleasantness, quality,
and ease-of-use but that the experience of the sound dominated over the visual experience. In Experiment 2, pictures
from the international affective pictures set were combined with frequency-modulated tones that varied in the amount of
sensory pleasantness by manipulating the amount of roughness. The combination of sounds and pictures were rated on
the valence and arousal dimensions of the circumplex model of core affect. It was found that the sounds only negatively
affected the experience of the pictures on the valence dimension. The arousal level was not affected by the sounds. Both
experiments show that sound can affect the perception and experience of pictures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new application for VR has emerged: product development, in which several stakeholders (from engineers to end
users) use the same VR for development and communicate purposes. Various characteristics among these stakeholders
vary considerably, which imposes potential constraints to the VR. The current paper discusses the influence of three
types of exploration of objects (i.e., none, passive, active) on one of these characteristics: the ability to form mental
representations or visuo-spatial ability (VSA). Through an experiment we found that all users benefit from exploring
objects. Moreover, people with low VSA (e.g., end users) benefit from an interactive exploration of objects opposed to
people with a medium or high VSA (e.g. engineers), who are not sensitive for the type of exploration. Hence, for VR
environments in which multiple stakeholders participate (e.g. for product development), differences among their
cognitive abilities (e.g., VSA) have to be taken into account to enable an efficient usage of VR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We introduce a novel system that allows users to experience the sensation of touch in a computer graphics environment.
In this system, the user places his/her hand on an array of pins, which is moved about space on a 6 degree-of-freedom
robot arm. The surface of the pins defines a surface in the virtual world. This "virtual hand" can move about the
virtual world. When the virtual hand encounters an object in the virtual world, the heights of the pins are adjusted so
that they represent the object's shape, surface, and texture. A control system integrates pin and robot arm motions to
transmit information about objects in the computer graphics world to the user. It also allows the user to edit, change and
move the virtual objects, shapes and textures. This system provides a general framework for touching, manipulating,
and modifying objects in a 3-D computer graphics environment, which may be useful in a wide range of applications,
including computer games, computer aided design systems, and immersive virtual worlds.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We tend to think of our body image as fixed. However, human brains appear to support highly negotiable body images.
As a result, our brains show a remarkable flexibility in incorporating non-biological elements (tools and technologies)
into the body image, provided reliable, real-time intersensory correlations can be established, and artifacts can be
plausibly mapped onto an already existing body image representation. A particularly interesting and relevant
phenomenon in this respect is a recently reported crossmodal perceptual illusion known as the rubber-hand illusion
(RHI). When a person is watching a fake hand being stroked and tapped in precise synchrony with his or her own unseen
hand, the person will, within a few minutes of stimulation, start experiencing the fake hand as an actual part of his or her
own body. In this paper, we will review recent work on the RHI and argue that such experimental transformation of the
intimate ties between body morphology, proprioception and self-perception enhances our fundamental understanding of
the phenomenal experience of self. Moreover, it will enable us to significantly improve the design of interactive media,
including the design of avatars in virtual environments and digital games, as well as a range of human-like telerobotic
devices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Minimally invasive therapy (MIT) is one of the most important trends in modern medicine. It includes a wide range of
therapies in videoscopic surgery and interventional radiology and is performed through small incisions. It reduces
hospital stay-time by allowing faster recovery and offers substantially improved cost-effectiveness for the hospital and
the society. However, the introduction of MIT has also led to new problems. The manipulation of structures within the
body through small incisions reduces dexterity and tactile feedback. It requires a different approach than conventional
surgical procedures, since eye-hand co-ordination is not based on direct vision, but more predominantly on image
guidance via endoscopes or radiological imaging modalities. ARIS*ER is a multidisciplinary consortium developing a
new generation of decision support tools for MIT by augmenting visual and sensorial feedback. We will present tools
based on novel concepts in visualization, robotics and haptics providing tailored solutions for a range of clinical
applications. Examples from radio-frequency ablation of liver-tumors, laparoscopic liver surgery and minimally invasive
cardiac surgery will be presented. Demonstrators were developed with the aim to provide a seamless workflow for the
clinical user conducting image-guided therapy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A key problem of artificial visual prosthesis is the low resolution due to the limited number of electrodes. Various methods such as edge detection, contrast enhancement have been studied as the solutions of the low resolution problem and these methods have been performed to face or object recognition in the close-up image. In this paper, we proposed the region-of-interest detection method using a context-based model, which is appropriate for real situations. The visually-salient region was detected by combining the saliency map with color information. In experiment, to evaluate the proposed model, gaze was estimated using an eye tracker when subjects watch the original image and two types of 10 × 10 pixelized images produced by conventional and saliency based method, respectively. Each gaze of pixelized images was compared with the gaze of the original image. The experiment showed that the gaze using the proposed context based model much more correlates with the gaze of the original image than that of conventional model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The environments we live in and the tasks we perform in those environments have shaped the design of our visual
systems through evolution and experience. This is an obvious statement, but it implies three fundamental components
of research we must have if we are going to gain a deep understanding of biological vision systems: (a) a rigorous
science devoted to understanding natural environments and tasks, (b) mathematical and computational analysis of how
to use such knowledge of the environment to perform natural tasks, and (c) experiments that allow rigorous
measurement of behavioral and neural responses, either in natural tasks or in artificial tasks that capture the essence of
natural tasks. This approach is illustrated with two example studies that combine measurements of natural scene
statistics, derivation of Bayesian ideal observers that exploit those statistics, and psychophysical experiments that
compare human and ideal performance in naturalistic tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Hyperspectral image data can provide very fine spectral resolution with more than 200 bands, yet presents challenges for
visualization techniques for displaying such rich information on a tristimulus monitor. This study developed a
visualization technique by taking advantage of both the consistent natural appearance of a true color image and the
feature separation of a PCA image based on a biologically inspired visual attention model. The key part is to extract the
informative regions in the scene. The model takes into account human contrast sensitivity functions and generates a
topographic saliency map for both images. This is accomplished using a set of linear "center-surround" operations
simulating visual receptive fields as the difference between fine and coarse scales. A difference map between the
saliency map of the true color image and that of the PCA image is derived and used as a mask on the true color image to
select a small number of interesting locations where the PCA image has more salient features than available in the
visible bands. The resulting representations preserve hue for vegetation, water, road etc., while the selected attentional
locations may be analyzed by more advanced algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Defined as an attentive process in the context of visual sequences, dynamic visual attention refers to the selection
of the most informative parts of video sequence. This paper investigates the contribution of motion in dynamic
visual attention, and specifically compares computer models designed with the motion component expressed
either as the speed magnitude or as the speed vector. Several computer models, including static features (color,
intensity and orientation) and motion features (magnitude and vector) are considered. Qualitative and quantitative
evaluations are performed by comparing the computer model output with human saliency maps obtained
experimentally from eye movement recordings. The model suitability is evaluated in various situations (synthetic
and real sequences, acquired with fixed and moving camera perspective), showing advantages and inconveniences
of each method as well as preferred domain of application.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The importance of motion in attracting attention is well known. While watching videos, where motion is
prevalent, how do we quantify the regions that are motion salient? In this paper, we investigate the role of
motion in attention and compare it with the influence of other low-level features like image orientation and
intensity. We propose a framework for motion saliency. In particular, we integrate motion vector information
with spatial and temporal coherency to generate a motion attention map. The results show that our model
achieves good performance in identifying regions that are moving and salient. We also find motion to have
greater influence on saliency than other low-level features when watching videos.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
According to the literature, automatic video summarization techniques can be classified in two parts, following the
output nature: "video skims", which are generated using portions of the original video and "key-frame sets", which
correspond to the images, selected from the original video, having a significant semantic content. The difference between
these two categories is reduced when we consider automatic procedures. Most of the published approaches are based on
the image signal and use either pixel characterization or histogram techniques or image decomposition by blocks.
However, few of them integrate properties of the Human Visual System (HVS). In this paper, we propose to extract keyframes
for video summarization by studying the variations of salient information between two consecutive frames. For
each frame, a saliency map is produced simulating the human visual attention by a bottom-up (signal-dependent)
approach. This approach includes three parallel channels for processing three early visual features: intensity, color and
temporal contrasts. For each channel, the variations of the salient information between two consecutive frames are
computed. These outputs are then combined to produce the global saliency variation which determines the key-frames.
Psychophysical experiments have been defined and conducted to analyze the relevance of the proposed key-frame
extraction algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual Perception in the Detection and Tracking of Objects
Multiple Object Tracking (MOT) experiments show that human observers can track over several seconds up to five
moving targets among several moving distractors. We extended these studies by designing modified MOT experiments
to investigate the spatio-temporal characteristics of human visuo-cognitive mechanisms for tracking and applied the
findings and insights obtained from these experiments in designing computational multiple object tracking algorithms.
Recent studies indicate that attention both enhances the neural activity of relevant information and suppresses the
irrelevant visual information in the surround. Results of our experiments suggest that the suppressive surround of
attention extends up to 4 deg from the target stimulus, and it takes at least 100 ms to build it. We suggest that when the
attentional windows corresponding to separate target regions are spatially close, they can be grouped to form a single
attentional window to avoid interference originating from suppressive surrounds. The grouping experiment results
indicate that the attentional windows are grouped into a single one when the distance between them is less than 1.5 deg.
Preliminary implementation of the suppressive surround concept in our computational video object tracker resulted in
less number of unnecessary object merges in computational video tracking experiments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents the results of two psychophysical experiments designed to investigate the effects of size,
location, blur, and contrast on the perceived visual interest of objects within images. In the first experiment,
digital composting was used to create images containing objects (humans, animals, and non-living objects)
which varied in controlled increments of size, location, blur, and contrast. Ratings of perceived interest were
then measured for each object. We found that: (1) As object size increases, perceived interest increases but
exhibits diminished gains for larger sizes; (2) As an object moves from the center of the image toward the
image's edge, perceived interest decreases nearly linearly with distance; (3) Blurring imposes a substantial initial
decrease in perceived interest, but this drop is relatively lessened for highly blurred objects; (4) As an object's
RMS contrast is increased, perceived interest increases nearly linearly. Furthermore, these trends were quite
similar for all three categories (human, animal, non-living object). To determine whether these data can predict
the perceived interest of objects in real, non-composited images, a second experiment was performed in which
subjects rated the visual interest of each of 562 objects in 150 images. Based on these results, an algorithm is
presented which, given a segmented image, attempts to generate an object-level interest map.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The pupil dilation reflex is mediated by inhibition of the parasympathetic Edinger-Westphal oculomotor complex and
sympathetic activity. It has long been documented that emotional and sensory events elicit a pupillary reflex dilation. Is
the pupil response a reliable marker of a visual detection event? In two experiments where viewers were asked to report
the presence of a visual target during rapid serial visual presentation (RSVP), pupil dilation was significantly associated
with target detection. The amplitude of the dilation depended on the frequency of targets and the time of the detection.
Larger dilations were associated with trials having fewer targets and with targets viewed earlier during the trial. We also
found that dilation was strongly influenced by the visual task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the increased use of multimedia technologies, image compression has become increasingly popular. Compression
decreases the high demands for storage capacity and transmission bandwidth. However, when compressing an image,
some part of the information is lost, since the compression smoothes high frequencies thereby distorting small details.
This issue is crucial, especially in military, spying and medical systems. When planning these kinds of systems, the
image compression quality must be considered as well as how it affects the mission performance carried out by the user.
Our goal is to examine the behavior of the human eye during image scanning and try to quantify the effect of image
compression on observer tasks such as target acquisition. For this task, we used the standard JPEG2000 in order to
compress the images at different compression ratios ranging from 10% (the highest) to 100% (the original image). It was
found that animation images were more influenced by compression than thermal images. In general, as the compression
ratio increased the ability to acquire the targets decreased.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Adaptation exerts a continuous influence on visual coding, altering both sensitivity and appearance whenever there is a
change in the patterns of stimulation the observer is exposed to. These adaptive changes are thought to improve visual
performance by optimizing both discrimination and recognition, but may take substantial time to fully adjust the
observer to a new stimulus context. Here we explore the advantages of instead adapting the image to the observer,
obviating the need for sensitivity changes within the observer. Adaptation in color vision adjusts to both the average
color and luminance and to the variations in color and luminance within the scene. We modeled these adjustments as
gain changes in the cones and in multiple post-receptoral mechanisms tuned to stimulus contrasts along different color-luminance
directions. Responses within these mechanisms were computed for a range of different environments, based
on images sampled from a range of natural outdoor settings. Images were then adapted for different environments by
scaling the responses so that for each mechanism the average response equaled the response to a reference environment.
Transforming images in this way can increase the discriminability of different colors and the salience of novel colors. It
also provides a way to simulate how the world might look to an observer in different environments or to different
observers in the same environment. Such images thus provide a novel tool for exploring color appearance and the
perceptual and functional consequences of adaptation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Pictures can be drawn by hand, or imaged by optical means. Over time, pictures have changed from being rare and
unique to ubiquitous and common. They have changed from treasures to transients. This paper summarizes many
picture technologies, and discusses their dynamic range, their color and tone-scale rendering and their spatial image
processing.
High Dynamic Range (HDR) image capture and display has long been an interest for artists and photographers. The
discipline of reproducing scenes with a high range of luminances has a 5-century history that includes painting,
photography, electronic imaging and image processing. HDR images render high-range scene information into lowrange
reproductions. This paper studies the artistic techniques and scientific issues that control HDR image capture and
reproduction. Both the artist and the scientist synthesize HDR reproductions with spatial image processing. The artists
paints, or dodges and burns, the image he visualizes based on his human visual processing. The scientist, using
algorithms that mimic vision, calculates perceptually correct renditions with inaccurate reproductions of scene radiances.
The paper will discuss artists' techniques used in both painting and photography for HDR compression. It will also
describe how optical veiling glare severely limits the range of luminance that can be captured and seen. The
improvement in quality in digital HDR reproductions, as in HDR in art, depends on the spatial rendering of details in the
highlights and shadows.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As computer graphics continues to progress towards photo-realism, one branch of computer graphics has begun to
consider non-photorealistic rendering1,2 as a dedicated research domain. Digital imaging has long been tied to
photography, both digital and analog, and has long been focused on achieving, maintaining, demonstrating or
characterizing photographic quality. There has generally been limited effort in the area of non-photographic imaging.
This paper proposes that art or the artistic process can be used to inspire additional directions in imaging. More
specifically, a number of examples, both personal and from established artists, will be used to demonstrate a range of
non-photographic imaging techniques. The discussion broadly covers experimental imaging processes, use of text,
multiple image constructions and algorithmic cartooning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is often believed that modern viewers of visually presented information need to be pleased or kept concentrated by
feeding them several types of input simultaneously: the primary information and, moreover, what are regarded as
embellishments such as figuratively structured instead of plain uniform background of folders and slides in
presentations. However, there are many cases whereby the utility or efficiency of transmission of presented information
and aesthetical aspects inherent to this presentation are opposed. Examples for static images are: color combinations of
foreground and background in text and figures such as graphs that impede legibility; the use of low-contrast secondary
information in the form of figures or text in the same plane as the intended primary information; and gloss, causing
specular reflection and sometimes glare, applied to bezels of visual displays or to the face of the display itself.
Aesthetically intended aspects of dynamic images, such as flashing parts, may even cause health hazards, for example
photosensitive seizures.
Being aware of the possible opposition of utility and attractiveness means that a sensible choice can be made for the
relative strengths of the information-bearing and the aesthetic factors - including a 'strength zero' of the latter, if need be.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Most of the efficient objective image or video quality metrics are based on properties and models of the Human Visual System (HVS). This paper is dealing with two major drawbacks related to HVS properties used in such metrics applied in the DWT domain : subband decomposition and masking effect. The multi-channel behavior of the HVS can be emulated applying a perceptual subband decomposition. Ideally, this can be performed in the Fourier domain but it requires too much computation cost for many applications. Spatial transform such as DWT is a good alternative to reduce computation effort but the correspondence between the perceptual subbands and the usual wavelet ones is not straightforward. Advantages and limitations of the DWT are discussed, and compared with models based on a DFT. Visual masking is a sensitive issue. Several models exist in literature. Simplest models can only predict visibility threshold for very simple cue while for natural images one should consider more complex approaches such as entropy masking. The main issue relies on finding a revealing measure of the surround influences and an adaptation: should we use the spatial activity, the entropy, the type of texture, etc.? In this paper, different visual masking models using DWT are discussed and compared.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have used image difference metrics to measure the quality of a set of images to know how well they predict
perceived image difference. We carried out a psychophysical experiment with 25 observers along with a recording
of the observers gaze position. The image difference metrics used were CIELAB ΔEab, S-CIELAB, the hue angle
algorithm, iCAM and SSIM. A frequency map from the eye tracker data was applied as a weighting to the image
difference metrics. The results indicate an improvement in correlation between the predicted image difference
and the perceived image difference.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we studied how video compression and lightness scaling interact to affect the overall video quality and the
color quality attributes. We examined three subjective attributes: perceived color preference, perceived color
naturalness, and overall annoyance as digital videos were subjected to compression and lightness scaling.
Psychophysical experiments were carried out in which naïve subjects made numerical judgments of the three subjective
attributes. We found that preference and naturalness scores are concave down functions of mean lightness with an
associated maximum, while annoyance scores are concave up with an associated minimum. As compression increases,
both preference and naturalness scores decrease and vary less with mean lightness. Maximum preference, naturalness,
and annoyance scores generally occur at similar mean lightness values. Preference, naturalness, and annoyance scores
for individual videos, are approximated relatively well by Gaussian functions of mean lightness. Preference and
naturalness scores decreases while annoyance scores increase as an S-shaped function of the logarithm of the total
squared error. A three-parameter model is shown to provide a good description of how each attribute depends on
lightness and compression for individual videos. Model parameters vary with video content.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An image compression approach capable of exploiting redundancies in groups of images is introduced. The
approach is based on image segmentation, texture analysis and texture synthesis. The proposed algorithm
extracts textured regions from an image and merges them with similar texture data from other images, in order
to take advantage of textural re-occurrences between the images. The texture extraction is done by taking
overlapping rectangular texture parameter samples from the input image(s), and using a clustering algorithm
to merge them into spatially connected regions, resulting in a polygonal texture map. The textures of that
map are henceforth analysed by extracting various features from the texture regions. Using a metric defined
on these features, the textures are then merged with entries from a central database, which consists of all the
textures in all the images of the image collection, so that for each image, only a polygonal segmentation map
and references into this texture database need to be stored. Decoding (decompression) works by extracting the
polygonal texture map followed by filling the map regions with patterns generated using texture synthesis based
on the texture feature vectors from the database.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We describe a set of techniques for mapping one image to another based on the statistics of a training set. We
apply these techniques to the problems of image denoising and superresolution, but they should also be useful
for many vision problems where training data are available. Given a local feature vector computed from an
input image patch, we learn to estimate a subband coefficient of the output image conditioned on the patch.
This entails approximating a multidimensional function, which we make tractable by nested binning and linear
regression within bins. This method performs as well as nearest neighbor techniques, but is much faster. After
attaining this local (patch based) estimate, we force the marginal subband histograms to match a set of target
histograms, in the style of Heeger and Bergen.1 The target histograms are themselves estimated from the
training data. With the combined techniques, denoising performance is similar to state of the art techniques
in terms of PSNR, and is slightly superior in subjective quality. In the case of superresolution, our techniques
produce higher subjective quality than the competing methods, allowing us to attain large increases in apparent
resolution. Thus, for these two tasks, our method is very fast and very effective.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Natural images are meaningful to humans - the physical world exhibits statistical regularities that permit the
human visual system (HVS) to infer useful interpretations. These regularities communicate the visual structure
of the physical world and govern the statistics of images (image structure). A signal processing framework is
sought to analyze image characteristics for a relationship with human interpretation. This work investigates
the first step toward an objective visual information evaluation: predicting the recognition threshold of different
image representations. Given a image sequence, whose images begin as unrecognizable and are gradually refined
to include more information according to some measure, the recognition threshold corresponds to first the image
in the sequence in which an observer accurately identifies the content. Sequences are produced using two
types of image representations: signal-based and visual structure preserving. Signal-based representations add
information as dictated by conventional mathematical characterizations of images based on models of low-level
HVS processing and use basis functions as the basic image components. Visual structure preserving representations
add information to images attributed to visual structure and attempt to mimic higher-level HVS
processing by considering the scene's objects as the basic image components. An experiment is conducted to
identify the recognition threshold image. Several full-reference perceptual quality assessment algorithms are
evaluated in terms of their ability to predict the recognition threshold of different image representations. The
cross-correlation component of a modified version of the multi-scale structural similarity (MS-SSIM) metric,
denoted MS-SSIM*, exhibits a better overall correlation with the signal-based and visual structure preserving
representations' average recognition thresholds than the standard MS-SSIM cross-correlation component. These
findings underscore the significance of visual structure in recognition and advocate a multi-scale image structure
analysis for a rudimentary evaluation of visual information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Human perception of image distortions has been widely explored in recent years, however, research has not
dealt with distortions due to geometric operations. As a consequence, there is a lack of objective visual quality
measures for this class of distortions. In this paper we propose a method of objectively assessing the perceptual
quality of geometrically distorted images. Our approach is based on the theory of Markov Random Fields. The
idea is that the potential function of the Markov Random Field describing the distortion gives an indication
of the degradation of the distorted image. This work can be seen as the first step toward the definition of an
objective metric for geometric distortions in images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We discuss a new approach for lossy compression of bilevel images based on Markov random fields (MRFs).
The goal is to preserve key structural information about the image, and then reconstruct the smoothest image
that is consistent with this information. The image is compressed by losslessly coding the pixels in a square
grid of lines and adding bits when needed to preserve structural information. The decoder uses the MRF model
to reconstruct the interior of each block bounded by the grid, based on the pixels on its boundary, plus the
extra bits provided for certain blocks. The idea is that, as long as the key structural information is preserved,
then the smooth contours of the block having highest probability with respect to the MRF provides acceptable
reconstructions. We propose and consider objective criteria for both encoding and evaluating the quality and
structure preserving properties of the coded bilevel images. These include mean-squared error, MRF energy
(smoothness), and connected components (topology). We show that overall, for comparable mean-squared error,
the new approach provides perceptually superior reconstructions than existing lossy compression techniques at
lower encoding rates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes an experiment examining subjective ratings in response to variations in the reproduction quality of a
video signal. Additionally, the test was designed to examine if pricing affected subjective judgements. Test materials
were created with either constant quality or variable quality where quality was manipulated by reference to the video
frame rate. Subjects were required to provide both quality and acceptability ratings for each test sequence. Two levels of
variable quality were created: one in which the quality varied between medium and high quality (low variability), the
other being variability between low and high quality (high variability). Subjects were assigned to one of three price
bands prior to beginning the test. The test found that, for equivalent average quality sequences, subjects preferred
constant quality to high variability. There was no difference in ratings for constant quality and low variability sequences.
The results indicate that video encoding methods may take advantage of some variation in video quality provided the
perceptual impact of changes in quality are not marked.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An image content adaptation for visually impaired people based on the MPEG-21 Digital Item Adaptation (DIA)
standard is proposed. The content adaptation mainly considers spatial contrast vision characteristic of users, which is
represented by a contrast sensitivity function (CSF). There are three key contributions of the paper. First, the visual
perception of users who have different spatial contrast vision abilities is simulated by incorporating the HVS model
proposed by Pattanaik et al. Second, to measure spatial contrast vision, and thus realizing personalized content
adaptation depending on the severity of the visual ability of individual user, CSF is measured on computer-based
environment. The measured spatial contrast vision symptom and its severity, is represented in an interoperable way by
using an example of extended description tool provided by the MPEG-21 DIA specification. Third, the content
adaption is also proposed, which is personalized in a sense that the adapted content would be optimized to the given
description of a particular symptom and its severity. To assess the effectiveness of the proposed methods, we performed
a number of experiments targeting users with a low vision and showed how to determine and describe the CSF
parameters. Furthermore, statistical experiment is performed to verify the effectiveness of the proposed adaptation
process for users with the low vision symptom.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a framework of colour preference control to satisfy the consumer's colour related emotion. A colour
harmony algorithm based on two-colour combinations is developed for displaying the images with several complementary colour pairs as the relationship of two-colour combination. The colours of pixels belonging to complementary colour areas in HSV colour space are shifted toward the target hue colours and there is no colour change for the other pixels. According to the developed technique, dynamic emotions by the proposed hue conversion can be improved and the controlled output image shows improved colour emotions in the preference of the human viewer. The psychophysical experiments are conducted to investigate the optimal model parameters to produce the most pleasant image to the users in the respect of colour emotions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, two experiments were conducted to clarify the relation between RGB values and perceived blackness. In
the first experiment, the average RGB values of black surface areas in the test stimuli where observers begin to perceive
the areas 'black', and further another average RGB values where observers perceive the areas 'really black' were
determined. Results indicate that to realize a 'really black' surface, RGB values should be lower than those of the
original image in some pictures. In the second experiment, how and to what degree the RGB values of black area affect
the visual impression of artistic picture was investigated. Three dimensions, "high-quality axis", "mysterious axis", and
"feeling of material axis", were extracted by factor analysis. Results indicate that the Art students seem to be more
sensitive in the evaluations along the "high-quality axis" and "mysterious axis" than the Engineering students, while the
opposite tendency is shown in the evaluation along the "feeling of material axis".
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The variety of displays used to browse and view images has created a need to adapt an image representation to
constraints given by the viewing environment. In this paper various methods of adaptation to a small display size are
introduced with focus on adaptation of document images.
Compared to photographic images, document images pose an even greater challenge to represent on small size displays.
If a typical down-sampling of image data is performed, we not only loose some high-resolution data, but also semantic
information, such as readability, recognizability, and distinguishability of features.
We explore various ways of controlling document information such as readable text or distinguishable layout features in
different visualizations applying specific content-dependent scaling methods. Readability is preserved in "SmartNails"
via automatic content-dependent cropping, scaling and pasting. Content-dependent iconification is proposed to provide
distinguishability between layout features of document images. In the case of multi-page document content a rendering
in form of a video clip is proposed that performs content-dependent navigation through the image data given display size
and time constraints.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image thumbnails are used in most imaging products and applications, where they allow quick preview of the content of
the underlying high resolution images. The question: "How would you best represent a high resolution original image
given a fixed number of thumbnail pixels?" is addressed using both automatically and manually generated thumbnails.
Automatically generated thumbnails that preserve the image quality of the high resolution originals are first reviewed and
subjectively evaluated. These thumbnails allow interactive identification of image quality, while simultaneously allowing
the viewer's knowledge to select desired subject matter. Images containing textures are, however, difficult for the automatic
algorithm. Textured images are further studied by using photo editing to manually generate representative thumbnails.
The automatic thumbnails are subjectively compared to standard (filter and subsample) thumbnails using clean, blurry,
noisy, and textured images. Results using twenty subjects find the automatic thumbnails more representative of their
originals for blurry images. In addition, as desired, there is little difference between the automatic and standard thumbnails
for clean images. The noise component improves the results for noisy images, but degrades the results for textured images.
Further studying textured images, the manual thumbnails were subjectively compared to standard thumbnails for four
images. Evaluation using forty judgments found a bimodal distribution for preference between the standard and the manual
thumbnails, with some observers preferring manual thumbnails and others preferring standard thumbnails.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes an experiment that studies perceived video quality, with the goal to get a better understanding of
whether a temporal or a spatial MPEG-2 based adaptation method should be used for video transmission over variable
bandwidth. The research focused on the relation between in-scene motion and camera motion on spatial as well as
temporal distortions in video sequences. Participants were tested on their sensitivity and appreciation for spatial and
temporal distortions using the scale paradigm of direct comparison. Footage was shot to create video material of three
scenes with a systematic manipulation of in-scene motion and camera motion, which produced twelve different video
sequences. Results show a relation trend between the two types of motion and the two types of distortion in video
sequences. The main result indicates that participants generally rated spatial distortions as better video quality than the
same video sequence containing temporal distortions; even though video sequences containing spatial distortions were
coded at an overall lower bitrate than video sequences containing temporal distortions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Despite the growth in network capacity of wireless in-home networks, these networks often have insufficient capacity to
support multiple simultaneous Audio/Video streams. Unpredictable behavior of these networks results in a drop of video
quality for the end-user. A method for reducing the claim of an individual A/V stream on the network capacity is
controlled frame dropping. However, controlled frame dropping will only be accepted if its effect on the quality that endusers
experience is minimized. In this paper, we define an objective quality metric for frame dropping methods, to
determine when frame dropping is not effective any more. The quality metric, a fraction between 0 and 1, is related to
the characteristics of frame dropping. A quality level below 0.9 indicates that a detectable amount of frames has been
dropped. A quality level above 0.98 indicates that no significant frame drops occurred recently. The metric is validated
with simulations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Achieving ultimate visual realness of natural images on a display requires high resolution, so that artifacts due to finite
image resolution are undetectable. An image resolution of 30 cycles/degree (cpd) or one pixel/arc-minute is often used as
the criterion for viewing conditions when assessing displayed image quality. It is reasoned that if the pixel size is smaller
than the separable angle of normal vision (20/20), the pixel structure is invisible and doesn't negatively affect image
quality. However, it is not clear whether 30 cpd resolution is adequate to prevent seeing artifacts, especially for observers
with better than 20/20 vision. We conducted experiments to find the threshold resolution of natural images and its
dependence on visual acuity. Three objects were used; each object was presented 60 times at 5 resolutions (19.5, 26, 39,
52, or 78 cpd) next to the same image at a resolution of 156 cpd. Forty-five observers with visual acuity of 20/20 or
better were asked to make a forced-choice distinction between the image pair in regard to resolution. Each observer
indicated which image of the pair appeared at a higher resolution. The results show that the mean resolution for 75%
correct responses for each of the visual acuity groups increased from more than 30 cpd as visual acuity increased and
reached a plateau at 40-50 cpd at -0.3 logMAR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a novel algorithm for unsupervised segmentation of color images. The proposed approach utilizes a dynamic
color gradient thresholding scheme that guides the region growing process. Given a color image, a weighted vectorbased
color gradient map is generated. Seeds are identified and a dynamic threshold is then used to perform reliable
growing of regions on the weighted gradient map. Over-segmentation, if any, is addressed by a Similarity Measurebased
region merging stage to produce the final segmented image. Comparative results demonstrate the effectiveness of
this algorithm for color image segmentation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Eye tracking as a quantitative method for collecting eye movement data, requires the accurate knowledge of the eye
position, where eye movements can provide indirect evidence about what the subject sees. In this study two eye tracking
devices have been compared, a Head-mounted Eye Tracking Device (HED) and a Remote Eye Tracking Device (RED).
The precision of both devices has been evaluated, in terms of gaze position accuracy and stability of the calibration. For
the HED it has been investigated how to register data to real-world coordinates. This is needed since coordinates
collected by the HED eye tracker are relative to the position of the subject's head and not relative to the actual stimuli as
it is the case for the RED device. Result Results show that the precision gets worse with time for both eye tracking
devices. The precision of RED is better than the HED and the difference between them is around 10 - 16 pixels (5.584
mm). The distribution of gaze positions for HED and RED devices was expressed by a percentage representation of the
point of regard in areas defined by the viewing angle. For both eye tracking devices the gaze position accuracy has been
95-99% at 1.5-2° viewing angle. The stability of the calibration was investigated at the end of the experiment and the
obtained result was not statistically significant. But the distribution of the gaze position is larger at the end of the
experiment than at the beginning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Video Quality Experts Group (VQEG) is a group of experts from industry, academia, government and standards
organizations working in the field of video quality assessment. Over the last 10 years, VQEG has focused its efforts on
the evaluation of objective video quality metrics for digital video. Objective video metrics are mathematical models that
predict the picture quality as perceived by an average observer. VQEG has completed validation tests for full reference
objective metrics for the Standard Definition Television (SDTV) format. From this testing, two ITU Recommendations
were produced. This standardization effort is of great relevance to the video industries because objective metrics can be
used for quality control of the video at various stages of the delivery chain.
Currently, VQEG is undertaking several projects in parallel. The most mature project is concerned with objective
measurement of multimedia content. This project is probably the largest coordinated set of video quality testing ever
embarked upon. The project will involve the collection of a very large database of subjective quality data. About 40
subjective assessment experiments and more than 160,000 opinion scores will be collected. These will be used to
validate the proposed objective metrics. This paper describes the test plan for the project, its current status, and one of
the multimedia subjective tests.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Producing off-line captions for the deaf and hearing impaired people is a labor-intensive task that can require up to 18
hours of production per hour of film. Captions are placed manually close to the region of interest but it must avoid
masking human faces, texts or any moving objects that might be relevant to the story flow. Our goal is to use image
processing techniques to reduce the off-line caption production process by automatically placing the captions on the
proper consecutive frames. We implemented a computer-assisted captioning software tool which integrates detection of
faces, texts and visual motion regions. The near frontal faces are detected using a cascade of weak classifier and tracked
through a particle filter. Then, frames are scanned to perform text spotting and build a region map suitable for text
recognition. Finally, motion mapping is based on the Lukas-Kanade optical flow algorithm and provides MPEG-7
motion descriptors. The combined detected items are then fed to a rule-based algorithm to determine the best captions
localization for the related sequences of frames. This paper focuses on the defined rules to assist the human captioners
and the results of a user evaluation for this approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A breakthrough is needed in order to achieve a substantial progress in the field of Content-Based Image Retrieval
(CBIR). This breakthrough can be enforced by: 1) optimizing user-system interaction, 2) combining the
wealth of techniques from text-based Information Retrieval with CBIR techniques, 3) exploiting human cognitive
characteristics, especially human color processing, and 4) conducting benchmarks with users for evaluating
new CBIR techniques. In this paper, these guidelines are illustrated by findings from our research conducted
the last five years, which have lead to the development of the online Multimedia for Art ReTrieval (M4ART)
system: http://www.m4art.org. The M4ART system follows the guidelines on all four issues and is assessed
on benchmarks using 5730 queries on a database of 30,000 images. Therefore, M4ART can be considered as a
first step into a new era of CBIR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the context of medical display validation, a simulation chain has been developed to facilitate display design and
image quality validation. One important part is the human visual observer model to quantify the quality perception of
the simulated images. Since several years, multiple research groups are modeling the various aspects of human
perception to integrate them in a complete Human Visual System (HVS) and developing visible image difference
metrics. In our framework, the JNDmetrix is used. It reflects the human subjective assessment of images or video
fidelity. Nevertheless, the system is limited and not suitable for our accurate simulations. There is a limitation to RGB 8
bits integer images and the model takes into account display parameters like gamma, black offset, ambient light... It
needs to be extended. The solutions proposed to extend the HVS model are: precision enhancement to overcome the 8
bit limit, color space conversion between XYZ and RGB and adaptation to the display parameters. The preprocessing
does not introduce any kind of perceived distortion caused for example by precision enhancement. With this extension
the model is used in a daily basis in the display simulation chain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.