Contrast energy was proposed by Watson, Barlow, and Robson (Science, 1983) as a useful metric for representing luminance contrast target stimuli because it represents the detectability of the stimulus in photon noise for an ideal observer. We propose here the use of visible contrast energy metrics for detection and discrimination among static luminance patterns. The visibility is approximated with spatial frequency sensitivity weighting and eccentricity sensitivity weighting. The suggested weighting functions revise the Standard Spatial Observer (Watson and Ahumada, J. Vision, 2005) for luminance contrast detection , extend it into the near periphery, and provide compensation for duration. Under the assumption that the detection is limited only by internal noise, both detection and discrimination performance can be predicted by metrics based on the visible energy of the difference images.
Space operations present the human visual system with a wide dynamic range of images from faint stars and starlit
shadows to un-attenuated sunlight. Lunar operations near the poles will result in low sun angles, exacerbating visual
problems associated with shadowing and glare. We discuss the perceptual challenges these conditions will present to the
human explorers, and consider some possible mitigations and countermeasures. We also discuss the problems of
simulating these conditions for realistic training.
Aviation display system designers and evaluators need to know how discriminable displayed symbols will be over a wide range of conditions to assess the adequacy and effectiveness of flight display systems. If flight display symbols are to be safely recognized by pilots, it is necessary that they can be easily
discriminated from each other. Sometimes psychophysical measurements can answer this question, but computational modeling may be required to assess the numerous conditions and even help design the
empirical experiments that may be needed. Here we present an image discrimination model that includes position compensation. The model takes as input the luminance values for the pixels of two symbol images, the effective viewing distance, and gives as output the discriminability in just-noticeable-differences (d')
and the x and y offset in pixels needed to minimize the discriminability. The model predictions are shown to be a useful upper bound for human symbol identification performance.
KEYWORDS: Transparency, Error analysis, LCDs, Electronic imaging, Visualization, Heads up displays, Virtual reality, Human vision and color perception, Psychology, Calibration
Our previous experiments with additive and multiplicative transparent text on textured backgrounds show that readability can be more accurately predicted by adjusting the contrast with a contrast-gain-like divisive factor that includes the background RMS contrast. However, the factor performed poorly at predicting readability differences on two different patterned backgrounds. Using the same images of the previous study we presented the target words alone and single letters cut out of the target words. We found that word identification and word discriminability was affected by the backgrounds in the same way that the paragraph search performance was affected, but that letter identifiability on these two backgrounds was predicted by the metric. We also found a significant improvement from including different contrast gains for positive and negative contrasts in the metric. Unfortunately, word readability is not necessarily simply related to letter identifiability and simple contrast measures.
KEYWORDS: Colorimetry, Spatial frequencies, Modulation, Data modeling, Contrast sensitivity, Visual process modeling, Distance measurement, Visualization, Human vision and color perception, Databases
The aim of the ColorFest is to extend the original ModelFest (http://vision.arc.nasa.gov/modelfest/) experiments to build a spatio-chromatic standard observer for the detection of static coloured images. The two major issues that need to be addressed are (1) the contrast sensitivity functions for the three chromatic mechanisms and (2) how the output of these channels is combined. We measured detection thresholds for stimuli modulated along different colour directions and for a wide range of spatial frequencies. The three main directions (an achromatic direction, a nominally isoluminant red-green direction, and the tritanopic confusion line) and four intermediate colour directions were used. These intermediate directions were the vector sums of the thresholds along the main directions. We evaluate two models. Detection performance is described by a linear transformation C defining the chromatic tuning and a diagonal matrix S reflecting the sensitivity of the chromatic mechanisms for a particular spatial frequency. The output of the three chromatic mechanisms is combined according to a Minkowski metric (General Separable Model), or according to a Euclidean Distance measure (Ellipsoidal Separable Model). For all three observers the ellipsoidal model fits as well as the general separable model. Estimating the chromatic tuning improves the model fit for one observer.
KEYWORDS: Target detection, Image filtering, RGB color model, Contrast sensitivity, Visual process modeling, Color vision, Data modeling, Linear filtering, Spatial frequencies, Error analysis
Masking of color targets was measured for fixed pattern noises made of all additive combinations of white/black, red/green, and blue/yellow noise. Results are compared with the predictions of a cone-contrast-based masking model with and without cross-channel masking. The model without cross- channel masking performed very well.
The image sequence discrimination model we use models optical blurring and retinal light adaptation. Two parallel channels, sustained and transient, with different masking rules based on contrast gain control, are used. Performance of the model was studied for two tasks representative of a video communication system with versions of monochrome H.263 compressed images. In the first study, five image sequences constituted pairs of non-compressed and compressed images to be discriminated with a 2-alternative-forced-choice method together with a staircase procedure. The thresholds for each subject were calculated. Analysis of variance showed that the differences between the pictures were significant. The model threshold was close to the average of the subjects for each picture, and the model thus predicted these results quite well. In the second study, the effect of transmission errors on the Internet, i.e. packet losses, was tested with the method of constant stimuli. Both reference and comparison image was distorted. The task of the subjects was to judge whether the presented video quality was worse than the initially seen reference video. Two different quality levels of the compressed sequences were simulated. The differences in the thresholds among the different video scenes were to some extent predicted by the model. Category scales indicate that detection of distortions and overall quality judgements are based on different psychological processes.
Many image discrimination models are available for static imags. However, in many applications temporal information is important, so image fidelity metrics for image sequences are needed as well. Ahumada et al presented a discrimination model for image sequences. It is unusually in that it does not decompose the images into multiple frequency and orientation channels. This helps make it computationally inexpensive. It was evaluated for predicting psychophysical experiments measuring contrast sensitivity and temporal masking. The results were promising. In this paper we investigate the performance of the above-mentioned model of a practical application - surveillance with IR imagery. Model evaluation is based on two-alternative force choice experiments, using a staircase procedure to control signal amplitude. The observer is presented with two one-second- duration IR-image sequences, one of which has an added target signal. The observer's task is to guess which sequence contained the target. While the target is stationary in the image center, the background moves in one direction, simulating a tracking station in which the observer has locked on to the target. The results show that the model qualitatively, in four out of five cases, have the desired behavior.
Several discriminability measures were correlated with reading sped over a range of screen backgrounds. Reading speed was measured using a search task in which observers tried to find one of three works in a short paragraph of black text. There were four background patterns combined with three colors at two intensities. The text contrast had a small positive correlation with speed. Background RMS contrast showed a stronger, negative correlation. Text energy in the spatial frequency bands corresponding to lines and letters also showed strong relationships. A general procedure for constructing a masking index from an image discrimination model is described and used to generate two examples indices: a global masking index, based on a single filter model combining text contrast and background RMS contrast, and a spatial-frequency-selective masking index. These indices did not lead to better correlations than those of the RMS measures alone, but they should lead to better correlations when there are larger variations in text contrast and masking patterns.
Nonlinear contributions to pattern classification by humans are analyzed by using previously obtained data on discrimination between aligned lines and offset lines. We how that the optimal linear model can be rejected even when the parameters of the model are estimated individually for each observer. We use a new measure of agreement to reject the linear model and to test simple nonlinear operators. The first nonlinearity is position uncertainty. The linear kernels are shrunk to different extents and convolved with the input images. A Gaussian window weights the results of the convolutions and the maximum in that window is selected as the internal variable. The size of the window is chosen such as to maintain a constant total amount of spatial filtering, i.e., the smaller kernels have a larger position uncertainty. The result of two observers indicate that the best agreement is obtained at a moderate degree of position uncertainty, plus-minus one min of arc. Finally, we analyze the effect of orientation uncertainty and show that agreement can be further improved in some cases.
A computer vision algorithm was developed to detect moving aircraft located in video images. Using a gradient-based approach, the algorithm computes optical flow vectors in each frame of the sequence. Vectors with similar characteristics (location, magnitude, and direction) are clustered together using a spatial consistency test. Vectors that pass the spatial consistency test are extended temporally to make predictions about the optical flow locations, magnitudes, and directions in subsequent frames. The actual optical flow vectors that are consistent with the predicted vectors are labeled as vectors associated with a moving target. The algorithm was tested on images obtained with a video camera mounted below the nose of a Boeing 737. The algorithm correctly detected an aircraft from a distance of one mile in over 80% of the frames with zero false alarms.
We present a simplified dual-channel discrimination model with spatio-temporal filters to represent the visual system contrast sensitivity, and masking based on local spatio- temporal contrast energy. The contrast sensitivity filter parameters of the model were based on previous work. The masking and global sensitivity parameters are calibrated to masking data using brief grating target signals masked by a 700 msec grating with the same spatial parameters.
Here we demonstrate a method for constructing stimulus classification images. These images provide information regarding the stimulus aspects the observer uses to segregate images into discrete response categories. Data are first collected on a discrimination task containing low contrast noise. The noises are then averaged separately for the stimulus-response categories. These averages are then summed with appropriate signs to obtain an overall classification image. We determine stimulus classification images for a vernier acuity task to visualize the stimulus features used to make these precise position discriminations. The resulting images reject the idea that the discrimination is performed by the single best discriminating cortical unit. The classification images show one Gabor-like filter for each line, rejecting the nearly ideal assumption of image discrimination models predicting no contribution from the fixed vernier line.
The ability of a human observer to locate a lesion in natural medical image backgrounds (extracted from patients x-ray coronary angiograms) is degraded by two major factors: (1) the noisy variations in the background, (2) the presence of a high contrast complex background (through pattern masking effects). The purpose of this paper is to isolate and model the effect of a deterministic complex background on visual signal detection in natural medical image backgrounds. We perform image discrimination experiments where the observers have to discriminate an image containing the background plus signal from an image containing the background only. Five different samples of medical image backgrounds were extracted from patients' digital x-ray coronary angiograms. On each trial, two images were shown sequentially, one image with the simulated contrast target and the other without. The observer's task was to select the image with the target. An adaptive staircase method was used to determine the sequence of signal contrasts presented and the signal's energy thresholds were determined by maximum likelihood estimation. We tested the ability of single channel and multiple channel image discrimination models with a variety of contrast gain control mechanisms to predict the variation of the signal energy threshold in the different background samples. Human signal energy thresholds were best predicted by a multiple channel model with wide band masking.
Image discrimination modes are used to predict the visibility of the difference between two images. Using a four category rating scale method, Rohaly et al. (SPIE Vol. 2411) and Ahumada & Beard (SPIE Vol. 2657) found that image discrimination models can predict target detectability when the background is kept constant, or 'fixed.' In experiment I, we use this same rating scale method and find no difference between 'fixed' and 'random' noise (where the white noise changes from trial to trial). In experiment II, we compare fixed noise and two random noise conditions. Using a two- interval forced-choice procedure, the 'random' noise was either the same or different in the two intervals. Contrary to image discrimination model predictions, the same random noise condition produced greater masking than the 'fixed' noise. This suggests that observers use less efficient target templates than image discrimination models implicitly assume. Also, performance appeared limited by internal process variability rather than external noise variability since similar masking was obtained for both random noise types.
Experiments on visual detection in computer simulated noise (e.g. white noise) show that random variations from location to location in the image (due to noise) degrade human performance. Psychophysical experiments of visual detection of signals superimposed on a known deterministic background ('mask') show that human performance can be degraded by the presence of a high contrast deterministic background through divisive inhibition. The purpose of this paper is to perform a psychophysical experiment to determine the relative importance of these two sources of performance degradation (random background variations and contrast masking effects) in human visual detection in natural medical image backgrounds. The results show that both contrast masking and random background variations degrade human performance for detecting signals in natural medical image backgrounds. These results suggest that current observer models which do not include a source of degradation due to the deterministic presence of the background might need to model such effects in order to reliably predict human visual detection in natural medical image backgrounds.
Observers viewed a simulated airport runway landing scene with an obstructing aircraft on the runway and rated the visibility of the obstructing object in varying levels of white fixed-pattern noise. The effect of the noise was compared with the predictions of single and multiple channel discrimination models. Without a contrast masking correction, both models predict almost no effect of the fixed-pattern noise. A global contrast masking correction improves both models' predictions, but the predictions are best when the masking correction is based only on the noise contrast (does not include the background image contrast).
Image compression based on quantizing the image in the discrete cosine transform (DCT) domain can generate blocky artifacts in the output image. It is possible to reduce these artifacts and RMS error by adjusting measures of block edginess and image roughness, while restricting the DCT coefficient values to values that would have been quantized to those of the compressed image. This paper presents a fast algorithm to replace our gradient search method for RMS error reduction and image smoothing after adjustment of DCT coefficient amplitude.
Object detection involves looking for one of a large set of object subimages in a large set of background images. Image discrimination models predict the probability that an observer will detect a difference between two images. We find that discrimination models can predict the relative detectability of objects in different images, suggesting that these simpler models may be useful in some object detection applications. Six images of a vehicle in an otherwise natural setting were altered to remove the vehicle and mixed with the original image in various proportions. Nineteen observers rated the 24 images for the presence of a vehicle. The pattern of observer detectabilities for the different images was predicted by three discrimination models. A Cortex transform discrimination model, a contrast sensitivity function filter model, and a root-mean-square difference predictor based on the digital image values gave prediction errors of 15%, 49%, and 46%, respectively. Two observers given the same images repeatedly to make the task a discrimination task rated the images similarly, but had detectabilities a factor of two higher.
Several recent image compression standards rely upon the discrete cosine transform (DCT). Models of DCT basis function visibility can be used to design quantization matrices for arbitrary viewing conditions and images. Here we report new results on the effects of viewing distance and contrast masking on basis function visibility. We measured contrast detection thresholds for DCT basis functions at viewing distances yielding 16, 32, and 64 pixels/degree. Our detection model has been elaborated to incorporate the observed effects. We have also measured detection thresholds for individual basis functions when superimposed upon another basis function of the same or a different frequency. We find considerable masking between nearby DCT frequencies. A model for these masking effects also is presented.
Image compression based on quantizing the image in the discrete cosine transform (DCT) domain can generate blocky artifacts in the output image. It is possible to reduce these artifacts and rms error by correcting DCT domain measures of block edginess and image roughness, while restricting the DCT coefficient values to values that would have been quantized to those of the compressed image.
A detection model is developed to predict visibility thresholds for discrete cosine transform coefficient quantization error, based on the luminance and chrominance of the error. The model is an extension of a previously proposed luminance-based model, and is based on new experimental data. In addition to the luminance-only predictions of the previous model, the new model predicts the detectability of quantization error in color space directions in which chrominance error plays a major role. This more complete model allows DCT coefficient quantization matrices to be designed for display conditions other than those of the experimental measurements: other display luminances, other veiling luminances, other spatial frequencies (different pixel sizes, viewing distances, and aspect ratios), and other color directions.
KEYWORDS: Quantization, Visual process modeling, Spatial frequencies, Image compression, Neon, Visualization, Visibility, Matrices, Data modeling, Human vision and color perception
A model is developed to approximate visibility thresholds for discrete cosine transform (DCT) coefficient quantization error based on the peak-to-peak luminance of the error image. Experimentally measured visibility thresholds for R, G, and B DCT basis functions can be predicted by a simple luminance-based detection model. This model allows DCT coefficient quantization matrices to be designed for display conditions other than those of the experimental measurements: other display luminances, other veiling luminances, and other spatial frequencies (different pixel spacings, viewing distances, and aspect ratios).
This paper describes the design and operation of a new simulation model for color matrix display development. It models the physical structure, the signal processing, and the visual perception of static displays, to allow optimization of display design parameters through image quality measures. The model is simple, implemented in the Mathematica computer language, and highly modular. Signal processing modules operate on the original image. The hardware modules describe backlights and filters, the pixel shape, and the tiling of the pixels over the display. Small regions of the displayed image can be visualized on a CRT. Visual perception modules assume static foveal images. The image is converted into cone catches and then into luminance, red-green, and blue-yellow images. A Haar transform pyramid separates the three images into spatial frequency and direction-specific channels. The channels are scaled by weights taken from human contrast sensitivity measurements of chromatic and luminance mechanisms at similar frequencies and orientations. Each channel provides a detectability measure. These measures allow the comparison of images displayed on prospective devices and, by that, the optimization of display designs.
When models of human vision adequately measure the relative quality of candidate halftonings of an image, the problem of halftoning the image becomes equivalent to the search problem of finding a halftone that optimizes the quality metric. Because of the vast number of possible halftones, and the complexity of image quality measures, this principled approach has usually been put aside in favor of fast algorithms that seem to perform well. We find that the principled approach can lead to a range of useful halftoning algorithms, as we trade off speed for quality by varying the complexity of the quality measure and the thoroughness of the search. High quality halftones can be obtained reasonably quickly, for example, by using as a measure the vector length of the error image filtered by a contrast sensitivity function, and, as the search procedure the sequential adjustment of individual pixels to improve the quality measure. If computational resources permit, simulated annealing can find nearly optimal solutions.
KEYWORDS: Sensors, Visualization, Human vision and color perception, Visual process modeling, Calibration, Image filtering, Spatial frequencies, Algorithm development, Image processing, Linear filtering
A network learning algorithm is presented that computes interpolation functions that can compensate for weakened, jittered, or missing elements of a sensor array. The algorithm corrects errors in translation invariance, so prior knowledge of the input images is not required.
An algorithm is described for learning image interpolation functions for sensor arrays
whose sensor positions are somewhat disordered. The learning is based on failures of
translation invariance, so it does not require knowledge of the images being presented to the
visual system. Previously reported implementations of the method assumed the visual system
to have precise knowledge of the translations. We demonstrate here that translation estimates
computed from the imperfectly interpolated images can have enough accuracy to allow the
learning process to converge to a correct interpolation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.