New foundational ideas are used to define a novel approach to generic visual pattern recognition. These ideas
proceed from the starting point of the intrinsic equivalence of noise reduction and pattern recognition when
noise reduction is taken to its theoretical limit of explicit matched filtering. This led us to think of the logical
extension of sparse coding using basis function transforms for both de-noising and pattern recognition to the full
pattern specificity of a lexicon of matched filter pattern templates. A key hypothesis is that such a lexicon can
be constructed and is, in fact, a generic visual alphabet of spatial vision. Hence it provides a tractable solution
for the design of a generic pattern recognition engine. Here we present the key scientific ideas, the basic design
principles which emerge from these ideas, and a preliminary design of the Spatial Vision Tree (SVT). The latter
is based upon a cryptographic approach whereby we measure a large aggregate estimate of the frequency of
occurrence (FOO) for each pattern. These distributions are employed together with Hamming distance criteria
to design a two-tier tree. Then using information theory, these same FOO distributions are used to define a
precise method for pattern representation. Finally the experimental performance of the preliminary SVT on
computer generated test images and complex natural images is assessed.
Adaptive methods are defined and experimentally studied for a two-scale edge detection process that mimics human visual perception of edges and is inspired by the parvocellular (P) and magnocellular (M) physiological subsystems of natural vision. This two-channel processing consists of a high spatial acuity/coarse contrast channel (P) and a coarse acuity/fine contrast (M) channel. We perform edge detection after a very strong nonlinear image enhancement that uses smart Retinex image processing. Two conditions that arise from this enhancement demand adaptiveness in edge detection. These conditions are the presence of random noise further exacerbated by the enhancement process and the equally random occurrence of dense textural visual information. We examine how to best deal with both phenomena with an automatic adaptive computation that treats both high noise and dense textures as too much information and gracefully shifts from small-scale to medium-scale edge pattern priorities. This shift is accomplished by using different edge-enhancement schemes that correspond with the P- and M-channels of the human visual system. We also examine the case of adapting to a third image condition-namely, too little visual information-and automatically adjust edge-detection sensitivities when sparse feature information is encountered. When this methodology is applied to a sequence of images of the same scene but with varying exposures and lighting conditions, this edge-detection process produces pattern constancy that is very useful for several imaging applications that rely on image classification in variable imaging conditions.
KEYWORDS: Visualization, Image enhancement, Current controlled current source, Edge detection, Image processing, Information visualization, Signal to noise ratio, Image fusion, Spatial resolution, Pattern recognition
Adaptive methods are defined and experimentally studied for a two-scale edge detection process that mimics
human visual perception of edges and is inspired by the parvo-cellular (P) and magno-cellular (M) physiological
subsystems of natural vision. This two-channel processing consists of a high spatial acuity/coarse contrast
channel (P) and a coarse acuity/fine contrast (M) channel. We perform edge detection after a very strong
non-linear image enhancement that uses smart Retinex image processing. Two conditions that arise from
this enhancement demand adaptiveness in edge detection. These conditions are the presence of random noise
further exacerbated by the enhancement process, and the equally random occurrence of dense textural visual
information. We examine how to best deal with both phenomena with an automatic adaptive computation
that treats both high noise and dense textures as too much information, and gracefully shifts from a smallscale
to medium-scale edge pattern priorities. This shift is accomplished by using different edge-enhancement
schemes that correspond with the (P) and (M) channels of the human visual system. We also examine the
case of adapting to a third image condition, namely too little visual information, and automatically adjust edge
detection sensitivities when sparse feature information is encountered. When this methodology is applied to a
sequence of images of the same scene but with varying exposures and lighting conditions, this edge-detection
process produces pattern constancy that is very useful for several imaging applications that rely on image
classification in variable imaging conditions.
A fundamental element of future generic pattern recognition technology is the ability to extract similar patterns for the
same scene despite wide ranging extraneous variables, including lighting, turbidity, sensor exposure variations, and
signal noise. In the process of demonstrating pattern constancy of this kind for retinex/visual servo (RVS) image
enhancement processing, we found that the pattern constancy performance depended somewhat on scene content. Most
notably, the scene topography and, in particular, the scale and extent of the topography in an image, affects the pattern
constancy the most. This paper will explore these effects in more depth and present experimental data from several time
series tests. These results further quantify the impact of topography on pattern constancy. Despite this residual
inconstancy, the results of overall pattern constancy testing support the idea that RVS image processing can be a
universal front-end for generic visual pattern recognition. While the effects on pattern constancy were significant, the
RVS processing still does achieve a high degree of pattern constancy over a wide spectrum of scene content diversity,
and wide ranging extraneousness variations in lighting, turbidity, and sensor exposure.
Over the last few years NASA Langley Research Center (LaRC) has been developing an Enhanced Vision System (EVS) to aid pilots while flying in poor visibility conditions. The EVS captures imagery using two infrared video cameras. The cameras are placed in an enclosure that is mounted and flown forward-looking underneath the NASA LaRC ARIES 757 aircraft. The data streams from the cameras are processed in realtime and displayed on monitors on-board the aircraft. With proper processing the camera system can provide better-than-human-observed imagery particularly during poor visibility conditions. However, to obtain this goal requires several different stages of processing including enhancement, registration, and fusion, and specialized processing hardware for realtime performance. We are using a realtime implementation of the Retinex algorithm for image enhancement, affine transformations for registration, and weighted sums to perform fusion. All of the algorithms are executed on a single TI DM642 digital signal processor (DSP) clocked at 720 MHz. The image processing components were added to the EVS system, tested, and demonstrated during flight tests in August and September of 2005. In this paper we briefly discuss the EVS image processing hardware and algorithms. We then discuss implementation issues and show examples of the results obtained during flight tests.
Advances in space robotics technology hinge to a large extent upon the development and deployment of sophisticated new vision-based methods for automated in-space mission operations and scientific survey. To this end, we have developed a new concept for automated terrain analysis that is based upon a generic image enhancement platform-multi-scale retinex (MSR) and visual servo (VS) processing. This pre-conditioning with the MSR and the VS produces a "canonical" visual representation that is largely independent of lighting variations, and exposure errors. Enhanced imagery is then processed with a biologically inspired two-channel edge detection process, followed by a smoothness based criteria for image segmentation. Landing sites can be automatically determined by examining the results of the smoothness-based segmentation which shows those areas in the image that surpass a minimum degree of smoothness. Though the MSR has proven to be a very strong enhancement engine, the other elements of the approach-the VS, terrain map generation, and smoothness-based segmentation-are in early stages of development. Experimental results on data from the Mars Global Surveyor show that the imagery can be processed to automatically obtain smooth landing sites. In this paper, we describe the method used to obtain these landing sites, and also examine the smoothness criteria in terms of the imager and scene characteristics. Several examples of applying this method to simulated and real imagery are shown.
Aerial imagery of the Earth is an invaluable tool for the assessment of ground features, especially during times of disaster. Researchers at NASA's Langley Research Center have developed techniques which have proven to be useful for such imagery. Aerial imagery from various sources, including Langley's Boeing 757 Aries aircraft, has been studied extensively. This paper discusses these studies and demonstrates that better-than-observer imagery can be obtained even when visibility is severely compromised. A real-time, multi-spectral experimental system will be described and numerous examples will be shown.
Aerial images from the Follow-On Radar, Enhanced and Synthetic Vision Systems Integration Technology
Evaluation (FORESITE) flight tests with the NASA Langley Research Center's research Boeing 757 were
acquired during severe haze and haze/mixed clouds visibility conditions. These images were enhanced using
the Visual Servo (VS) process that makes use of the Multiscale Retinex. The images were then quantified with
visual quality metrics used internally within the VS. One of these metrics, the Visual Contrast Measure, has
been computed for hundreds of FORESITE images, and for major classes of imaging-terrestrial (consumer),
orbital Earth observations, orbital Mars surface imaging, NOAA aerial photographs, and underwater imaging.
The metric quantifies both the degree of visual impairment of the original, un-enhanced images as well as the
degree of visibility improvement achieved by the enhancement process. The large aggregate data exhibits trends
relating to degree of atmospheric visibility attenuation, and its impact on the limits of enhancement performance
for the various image classes. Overall results support the idea that in most cases that do not involve extreme
reduction in visibility, large gains in visual contrast are routinely achieved by VS processing. Additionally, for
very poor visibility imaging, lesser, but still substantial, gains in visual contrast are also routinely achieved. Further, the data suggest that these visual quality metrics can be used as external standalone metrics for
establishing performance parameters.
The Multiscale Retinex With Color Restoration (MSRCR) is a non-linear image enhancement algorithm that provides simultaneous dynamic range compression, color constancy and rendition. The overall impact is to brighten up areas of poor contrast/lightness but not at the expense of saturating areas of good contrast/brightness. The downside is that with the poor signal-to-noise ratio that most image acquisition devices have in dark regions, noise can also be greatly enhanced thus affecting overall image quality. In this paper, we will discuss the impact of the MSRCR on the overall quality of an enhanced image as a function of the strength of shadows in an image, and as a function of the root-mean-square (RMS) signal-to-noise (SNR) ratio of the image.
Noise, whether due to the image-gathering device or some other reason,
reduces the visibility of fine features in an image. Several techniques
attempt to mitigate the impact of noise by performing a low-pass
filtering operation on the acquired data. This is based on the assumption
that the uncorrelated noise has high-frequency content and thus will be
suppressed by low-pass filtering. A result of this operation is that
edges in a noisy image also tend to get blurred, and, in some cases, may
get completely lost due to the low-pass filtering. In this paper, we
quantitatively assess the impact of noise on fine feature visibility by
using computer-generated targets of known spatial detail. Additionally,
we develop a new scheme for noise-reduction based on the connectivity of
edge-features. The overall impact of this scheme is to reduce overall
noise, yet retain the high frequency content that make edge-features sharp.
KEYWORDS: Digital signal processing, Cameras, Image processing, Signal processing, Short wave infrared radiation, Visibility, Image enhancement, Sensors, Long wavelength infrared, Video
Flying in poor visibility conditions, such as rain, snow, fog or haze, is inherently dangerous. However these
conditions can occur at nearly any location, so inevitably pilots must successfully navigate through them. At
NASA Langley Research Center (LaRC), under support of the Aviation Safety and Security Program Office
and the Systems Engineering Directorate, we are developing an Enhanced Vision System (EVS) that combines
image enhancement and synthetic vision elements to assist pilots flying through adverse weather conditions. This
system uses a combination of forward-looking infrared and visible sensors for data acquisition. A core function
of the system is to enhance and fuse the sensor data in order to increase the information content and quality
of the captured imagery. These operations must be performed in real-time for the pilot to use while flying. For
image enhancement, we are using the LaRC patented Retinex algorithm since it performs exceptionally well for
improving low-contrast range imagery typically seen during poor visibility poor visibility conditions. In general,
real-time operation of the Retinex requires specialized hardware. To date, we have successfully implemented a
single-sensor real-time version of the Retinex on several different Digital Signal Processor (DSP) platforms. In
this paper we give an overview of the EVS and its performance requirements for real-time enhancement and
fusion and we discuss our current real-time Retinex implementations on DSPs.
Current still image and video systems are typically of limited use in poor visibility conditions such as in rain, fog, smoke, and haze. These conditions severely limit the range and effectiveness of imaging systems because of the severe reduction in contrast. The NASA Langley Research Center’s Visual Information Processing Group has developed an image enhancement technology based on the concept of a visual servo that has direct applications to the problem of poor visibility conditions. This technology has been used in cases of severe image turbidity in air as well as underwater with dramatic results. Use of this technology could result in greatly improved performance of perimeter surveillance systems, military, security, and law enforcement operations, port security, both on land and below water, and air and sea rescue services, resulting in improved public safety.
The current X-ray systems used by airport security personnel for the detection of contraband, and objects such as knives and guns that can impact the security of a flight, have limited effect because of the limited display quality of the X-ray images. Since the displayed images do not possess optimal contrast and sharpness, it is possible for the security personnel to miss potentially hazardous objects. This problem is also common to other disciplines such as medical X-rays, and can be mitigated, to a large extent, by the use of state-of-the-art image processing techniques to enhance the contrast and sharpness of the displayed image. The NASA Langley Research Center's Visual Information Processing Group has developed an image enhancement technology that has direct applications to the problem of inadequate display quality. Airport security X-ray imaging systems would benefit considerably by using this novel technology, making the task of the personnel who have to interpret the X-ray images considerably easier, faster, and more reliable. This improvement would translate into more accurate screening as well as minimizing the screening time delays to airline passengers. This technology, Retinex, has been optimized for consumer applications but has been applied to medical X-rays on a very preliminary basis. The resultant technology could be incorporated into a new breed of commercial x-ray imaging systems which would be transparent to the screener yet allow them to see subtle detail much more easily, reducing the amount of time needed for screening while greatly increasing the effectiveness of contraband detection and thus improving public safety.
KEYWORDS: Digital signal processing, Image processing, Video, Image enhancement, Field programmable gate arrays, Signal processing, Cameras, Video processing, Detection and tracking algorithms, Data processing
The Retinex is a general-purpose image enhancement algorithm that is used to produce good visual representations
of scenes. It performs a non-linear spatial/spectral transform that synthesizes strong local contrast
enhancement and color constancy. A real-time, video frame rate implementation of the Retinex is required to meet the needs of various potential users. Retinex processing contains a relatively large number of complex
computations, thus to achieve real-time performance using current technologies requires specialized hardware
and software. In this paper we discuss the design and development of a digital signal processor (DSP) implementation
of the Retinex. The target processor is a Texas Instruments TMS320C6711 floating point DSP. NTSC video is captured using a dedicated frame-grabber card, Retinex processed, and displayed on a standard
monitor. We discuss the optimizations used to achieve real-time performance of the Retinex and also describe our future plans on using alternative architectures.
Noise is the primary visibility limit in the process of non-linear image
enhancement, and is no longer a statistically stable additive noise in
the post-enhancement image. Therefore novel approaches are needed to
both assess and reduce spatially variable noise at this stage in overall
image processing. Here we will examine the use of edge pattern analysis
both for automatic assessment of spatially variable noise and as a
foundation for new noise reduction methods.
Classical segmentation algorithms subdivide an image into its
constituent components based upon some metric that defines commonality
between pixels. Often, these metrics incorporate some measure of
"activity" in the scene, e.g. the amount of detail that is in a region.
The Multiscale Retinex with Color Restoration (MSRCR) is a general
purpose, non-linear image enhancement algorithm that significantly
affects the brightness, contrast and sharpness within an image. In this
paper, we will analyze the impact the (MSRCR) has on segmentation results
and performance.
KEYWORDS: Visualization, Visibility, Visibility through fog, Image processing, Servomechanisms, Image enhancement, Fiber optic gyroscopes, Signal to noise ratio, Interference (communication), Control systems
The advancement of non-linear processing methods for generic automatic clarification of turbid imagery has led us from extensions of entirely passive multiscale Retinex processing to a new framework of active measurement and control of the enhancement process called the Visual Servo. In the process of testing this new non-linear
computational scheme, we have identified that feature visibility limits in the post-enhancement image now simplify to a single signal-to-noise figure of merit: a feature is visible if the feature-background signal difference is greater than the RMS noise level. In other words, a signal-to-noise limit of approximately unity constitutes a lower limit on feature visibility.
An Enhanced Vision System (EVS) utilizing multi-sensor image fusion is currently under development at the NASA Langley Research Center. The EVS will provide enhanced images of the flight environment to assist pilots in poor visibility conditions. Multi-spectral images obtained from a short wave infrared (SWIR), a long wave infrared (LWIR), and a color visible band CCD camera, are enhanced and fused using the Retinex algorithm. The images from the different sensors do not have a uniform data structure: the three sensors not only operate
at different wavelengths, but they also have different spatial resolutions, optical fields of view (FOV), and bore-sighting inaccuracies. Thus, in order to perform image fusion, the images must first be co-registered. Image registration is the task of aligning images taken at different times, from different sensors, or from different viewpoints, so that all corresponding points in the images match. In this paper, we present two methods for registering multiple multi-spectral images. The first method performs registration using sensor specifications to match the FOVs and resolutions directly through image resampling. In the second method, registration is
obtained through geometric correction based on a spatial transformation defined by user selected control points and regression analysis.
When a linear image acquisition device captures an image, several noise sources impact the quality of the final image that can be digitally procesed. We have previously examined the impact of these noise sources in terms of their impact on the total amount of information that is contained in the image, and in terms of their impact on the restorability of the image data. In this paper, we will examine the effect of each of the noise sources on final image quality.
The experience of retinex image processing has prompted us to reconsider fundamental aspects of imaging and image processing. Foremost is the idea that a good visual representation requires a non-linear transformation of the recorded (approximately linear) image data. Further, this transformation appears to converge on a specific distribution. Here we investigate the connection between numerical and visual phenomena. Specifically the questions explored are: (1) Is there a well-defined consistent statistical character associated with good visual representations? (2) Does there exist an ideal visual image? And (3) what are its statistical properties?
A new approach to sensor fusion and enhancement is presented. The retinex image enhancement algorithm is used to jointly enhance and fuse data from long wave infrared, short wave infrared and visible wavelength sensors. This joint optimization results in fused data which contains more information than any of the individual data streams. This is especially true in turbid weather conditions, where the long wave infrared sensor would conventionally be the only source of usable information. However, the retinex algorithm can be used to pull out the details from the other data streams as well, resulting in greater overall information. The fusion uses the multiscale nature of the algorithm to both enhance and weight the contributions of the different data streams forming a single output data stream.
KEYWORDS: Visualization, Image enhancement, Image processing, Information visualization, Image quality, Image visualization, Digital image processing, Human vision and color perception, Light sources and illumination, Algorithm development
In the last published concept (1986) for a Retinex computation, Edwin Land introduced a center/surround spatial form, which was inspired by the receptive field structures of neurophysiology. With this as our starting point we have over the years developed this concept into a full scale automatic image enhancement algorithm - the Multi-Scale Retinex with Color Restoration (MSRCR) which combines color constancy with local contrast/lightness enhancement to transform digital images into renditions that approach the realism of direct scene observation. The MSRCR algorithm has proven to be quite general purpose, and very resilient to common forms of image pre-processing such as reasonable ranges of gamma and contrast stretch transformations. More recently we have been exploring the fundamental scientific implications of this form of image processing, namely: (i) the visual inadequacy of the linear representation of digital images, (ii) the existence of a canonical or statistical ideal visual image, and (iii) new measures of visual quality based upon these insights derived from our extensive experience with MSRCR enhanced images. The lattermost serves as the basis for future schemes for automating visual assessment - a primitive first step in bringing visual intelligence to computers.
KEYWORDS: Visualization, Image processing, Image enhancement, Information visualization, Image visualization, Sensors, Signal to noise ratio, Color vision, Signal processing, Distortion
The history of the spatial aspect of color perception is reviewed in order to lay a foundation for the discussion of retinex image processing. While retinex computations were originally conceived as a model for color constancy in human vision, the impact on local contrast and lightness is even more pronounced than the compensation for changes in the spectral distribution of scene illuminants. In the MSRCR, the goal of the computation is fidelity to the direct observation of scenes. The primary visual shortcoming of the recorded image is that dark zones such as shadow zones are perceived with much lower contrast and lightness than for the direct viewing of scenes. Extensive development and testing of the MSRCR leads us to form several hypotheses about imaging which appear to be basic and general in nature. These are that: 1) the linear representation of the image is not usually a good visual representation, 2) retinex image enhancements tend to approach a statistical ideal which suggests the existence of a canonical visual image, and 3) the mathematical form of the MSRCR suggests a deterministic definition of visual information which is the log of the spectral and spatial context ratios for any given image. These ideas imply that the imaging process should be thought of, not as a replication process whose goal is minimal distortion, but rather as a profound non-linear transformation process whose goal is a statistical ideal visual representation. These insights suggest new directions for practical advances in bringing higher levels of visual intelligence to the world of computing.
KEYWORDS: Image compression, Human vision and color perception, Visualization, Image processing, Cameras, Light sources and illumination, Analog electronics, Image enhancement, Digital cameras, Photography
The human vision system performs the tasks of dynamic range compression and color constancy almost effortlessly. The same tasks pose a very challenging problem for imaging systems whose dynamic range is restricted by either the dynamic response of film, in case of analog cameras, or by the analog-to-digital converters, in the case of digital cameras. The images thus formed are unable to encompass the wide dynamic range present in most natural scenes. Whereas the human visual system is quite tolerant to spectral changes in lighting conditions, these strongly affect both the film response for analog cameras and the filter responses for digital cameras, leading to incorrect color formulation in the acquired image. Our multiscale retinex, based in part on Edwin Land's work on color constancy, provides a fast, simple, and automatic technique for simultaneous dynamic range compression and accurate color rendition. The retinex algorithm is non-linear, and global-- output at a point is also a function of its surround--in extent. A comparison with conventional dynamic range compression techniques such as the application of point non- linearities. The applications of such an algorithm are many; from medical imaging to remote sensing; and from commercial photography to color transmission.
Visual communication, in the form of telephotography and television, for example, can be regarded as efficient only if the amount of information that it conveys about the scene to the observer approaches the maximum possible and the associated cost approaches
the minimum possible. Elsewhere we have addressed the problem of assessing the end-to-end performance of visual communication systems in terms of their efficiency in this sense by integrating the critical limiting factors that constrain image gathering into classical communication theory. We use this approach to assess the electro-optical design of image-gathering devices as a function of the I number and apodization of the objective lens and the aperture size and sampling geometry of the photodetection mechanism. Results show that an image-gathering device that is designed to optimize information capacity performs similarly to the human eye. For both, the performance approaches the maximum possible, in terms of the efficiency with which the acquired information can be transmitted as decorrelated data, and the fidelity, sharpness, and clarity with which fine detail can be restored.
Visual communication can be regarded as efficient only if the amount of information that it conveys from the scene to the observer approaches the maximum possible and the associated cost approaches the minimum possible. To deal with this problem, Fales and Huck have integrated the critical limiting factors that constrain image gathering into classical concepts of communication theory. This paper uses this approach to assess the electro-optical design of the image gathering device. Design variables include the f-number and apodization of the objective lens, the aperture size and sampling geometry of the photodetection mechanism, and lateral inhibition and nonlinear radiance-to-signal conversion akin to the retinal processing in the human eye. It is an agreeable consequence of this approach that the image gathering device that is designed along the guidelines developed from communication theory behaves very much like the human eye. The performance approaches the maximum possible in terms of the information content of the acquired data, and thereby, the fidelity, sharpness and clarity with which fine detail can be restored, the efficiency with which the visual information can be transmitted in the form of decorrelated data, and the robustness of these two attributes to the temporal and spatial variations in scene illumination.
The development of generalized contour/texture discrimination techniques is a central element necessary for machine vision recognition and interpretation of arbitrary images. Here the visual perception of texture, selected studies of texture analysis in machine vision, and diverse small samples of contour and texture are all used to provide insights into the fundamental characteristics of contour and texture. From these, an experimental discrimination scheme is developed and tested on a battery of natural images. The processing of contour and texture is considered as a unified problem of zonal determinations of stasis versus change. The visual perception of texture defined fine texture as a subclass which is interpreted as shading and is distinct from coarse figural similarity textures. Also, perception defined the smallest scale for contour/texture discrimination as 8 to 9 visual acuity units. Three contour/texture discrimination parameters were found to be moderately successful for this scale of discrimination: (1) lightness change in a blurred version of the image, (2) change in lightness change in the original image, and (3) percent change in edge counts relative to local maximum.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.