Infrared images are widely used in security monitoring and autonomous driving due to their resistance to light changes and adverse weather conditions. But, the low contrast and colorless characteristics of infrared images limit the effectiveness of human observation and subsequent detection and recognition algorithms. Methods for translating infrared images into visible images can overcome the above shortcomings, among which contrastive learning methods using self-similarity features achieve the best performance. However, methods based on contrastive learning face the sub-optimization problem caused by random sampling. And the contrastive loss based on self-similarity features faces the problem of encoding entanglement when used for infrared-visible image translation, that is, the features extracted from different categories of regions cannot be distinguished. Therefore, we propose a cross-similarity guided contrastive learning method for infrared-visible image translation. First, to address the randomness and inefficiency of the contrastive loss random sampling process, a sampling strategy based on information entropy ranking of cross-similarity matrix is proposed to obtain sampling points for subsequent contrastive loss calculation. By calculating the information entropy of cross-similarity matrix between input and generated images and sorting them, the sampling points with the most information can be obtained. Second, to alleviate the encoding entanglement problem of the self-similarity contrastive loss due to the low contrast of infrared images, multi-scale spatially adjacent graph structure consistency loss and spatially separated graph structure consistency loss based on cross-similarity matrices are proposed. Experiments on KAIST and FLIR datasets show that the proposed method has the best score and visual performance compared with multiple advanced infrared-visible translation methods. Ablation experiments further illustrate the effectiveness of the method.
Here we propose a novel image inpainting model DFG-GAN, which can effectively alleviate the artifacts problem when the missing region area is too large. Unlike other image inpainting models, our model can transfer the image inpainting task into a GAN task when the mask fills the total image. Apart from that, we also take advantage of the extra class label information to tell what kind of the damaged image is. The more information feed in, the better result shall be. Experiments on several publicly available datasets demonstrate the advantage of the proposed method over existing approaches, regarding both visual fidelity and margin texture.
Recognizing the transmission tower numbers is an import part of the automatic inspection of high-voltage transmission lines. However, it's infeasible to accomplish this task effectively in one step giving the large scene images shot by unmanned aerial vehicles. In this paper, we present a cascaded framework consists of two CNN components: number plate detection and serial number recognition. The proposed method reduces the difficulty of localizing number characters in large scenes by leveraging the robust background, number plates. On the one hand, the proposed cascaded coarse-to-fine method reduces the missing rate and improves the detection accuracy, on the other hand, the recognition complexity is greatly reduced. The experimental results on our collected dataset demonstrate the effectiveness of the proposed method.
With the development of intelligent transportation and parking, license plate detection in open environments is in great demand. However, due to the clutter of background and variation of license plates, the existing methods could not make a good balance between accuracy and efficiency. A method based on semantic region proposals is presented. By thinking from the pixel level, this method first adopts a semantic segmentation convolutional network for license plate candidate region extraction. To improve accuracy of segmentation, an enhanced loss function is designed. Afterward, a classification and regression network based on the oriented bounding box regression algorithm is used for region verification and refinement. Experiments on three public datasets show that the proposed method can be adapted to license plate images captured under different scenarios and can achieve better performance than the state-of-the-art methods.
Recently, object detection has been widely used in power systems to assist fault diagnosis of transmission lines. However, it is still faced with great challenges due to multi-size targets existing in a single inspection image. Current state-of-art object detection pipelines, like Faster R-CNN, perform well on large objects with low resolution, but usually fail to detect small objects due to low resolution and poor representation. Many existing object detectors for this problem typically exploit feature pyramids, multi-scale image inputs, etc., which can attain high accuracy but is computation and memory consuming. In this paper, we propose an improved cascaded Faster R-CNNs framework that reduces the computational cost while maintaining high detection accuracy to cope with multi-size object detection in high-resolution inspection images, where the first-stage Faster R-CNN is used to detect large objects while the second one detects small objects relative to large objects. We further merge the first-stage and the second into a single network by sharing convolutional features–using the semantic context between multi-size targets, the first stage tells the second where to look. For the "tell" step, we just map the bounding box coordinates of large objects detected in the first stage to the VGG16 network, crop the corresponding feature maps and feed them to the following second stage. Experiments on the test datasets demonstrate that our method achieves a higher detection mAP of 87.6% at 5FPS on an NVidia Titan X compared with the one-stage Faster R-CNN.
In recent years, the application of age and gender estimation from face images are becoming increasingly wider and deeper. Existing age and gender estimation pipelines usually process images through machine learning, like SVM, AdaBoost and etc. However, the performance gain of such method is usually limited to handle images with strict conditions or simple backgrounds. At present, age and gender estimation in an open environment still face enormous challenges. In this paper, we introduce a method based on double channel convolutional neural network (CNN) for accurate age and gender estimation in complex scenarios. To start with, detecting face regions with single-face or multifaces. Secondly, utilizing the face alignment based on the facial landmark detection. Finally, using double channel CNN structure with Xgboost to train the model for age and gender estimation. Experiments show that the proposed method based on double channel CNN can achieve a higher accuracy at comparable time cost compared with single channel CNN method and is robust to face images from wild conditions.
Due to the variation of background, illumination, and view point, license plate detection in an open environment is challenging. We propose a detection method by boundary clustering. To start with, a boundary map is obtained through Canny edge detector and removal of unwanted horizontal background edges. Second, boundaries are classified into different clusters by a density-based approach. In the approach, the density of each boundary is defined by the total gradient intensity of its neighboring and reachable boundaries. Also, the cluster centers and the number of them are determined automatically according to a minimum-distance principle. At last, a set of horizontal candidate regions with accurately located borders are extracted for classification. The classifier is trained on the histogram of oriented gradient feature by a linear support vector machine model. Experiments on three public datasets including images captured under different scenarios demonstrate that the proposed method outperforms several state-of-the-art methods in detection accuracy and its performance in efficiency is also comparable.
Scene text recognition has gained significant attention in the computer vision community. Character detection and recognition are the promise of text recognition and affect the overall performance to a large extent. We proposed a good initialization model for scene character recognition from cropped text regions. We use constrained character’s body structures with deformable part-based models to detect and recognize characters in various backgrounds. The character’s body structures are achieved by an unsupervised discriminative clustering approach followed by a statistical model and a self-build minimum spanning tree model. Our method utilizes part appearance and location information, and combines character detection and recognition in cropped text region together. The evaluation results on the benchmark datasets demonstrate that our proposed scheme outperforms the state-of-the-art methods both on scene character recognition and word recognition aspects.
KEYWORDS: Image segmentation, Binary data, Simulation of CCA and DLA aggregates, Image quality, Image processing, Sensors, Distortion, Optical character recognition, Detection and tracking algorithms, Intelligence systems
Character segmentation (CS) plays an important role in automatic license plate recognition and has been studied for decades. A method using multiscale template matching is proposed to settle the problem of CS for Chinese license plates. It is carried out on a binary image integrated from maximally stable extremal region detection and Otsu thresholding. Afterward, a uniform harrow-shaped template with variable length is designed, by virtue of which a three-dimensional matching space is constructed for searching of candidate segmentations. These segmentations are detected at matches with local minimum responses. Finally, the vertical boundaries of each single character are located for subsequent recognition. Experiments on a data set including 2349 license plate images of different quality levels show that the proposed method can achieve a higher accuracy at comparable time cost and is robust to images in poor conditions.
Nucleus and cytoplasm are both essential for white blood cell recognition but the edges of cytoplasm are too blurry to be detected because of instable staining and overexposure. This paper aims at proposing a cytoplasm enhancement operator (CEO) to achieve accurate convergence of the active contour model. The CEO contains two parts. First, a nonlinear over-exposure enhancer map is yielded to correct over-exposure, which suppresses background noise while preserving details and improving contrast. Second, the over-exposed regions of cytoplasm in particular is further enhanced by a tri- modal histogram specification based on the scale-space filtering. The experimental results show that the proposed CEO and its corresponding GVF snake is superior to other unsupervised segmentation approaches.
We present a robust method of detecting text in natural scenes. The work consists of four parts. First, automatically partition the images into different layers based on conditional clustering. The clustering operates in two sequential ways. One has a constrained clustering center and conditional determined cluster numbers, which generate small-size subregions. The other has fixed cluster numbers, which generate full-size subregions. After the clustering, we obtain a bunch of connected components (CCs) in each subregion. In the second step, the convolutional neural network (CNN) is used to classify those CCs to character components or noncharacter ones. The output score of the CNN can be transferred to the postprobability of characters. Then we group the candidate characters into text strings based on the probability and location. Finally, we use a verification step. We choose a multichannel strategy to evaluate the performance on the public datasets: ICDAR2011 and ICDAR2013. The experimental results demonstrate that our algorithm achieves a superior performance compared with the state-of-the-art text detection algorithms.
Within intelligent transportation systems, fast and robust license plate localization (LPL) in complex scenes is still a challenging task. Real-world scenes introduce complexities such as variation in license plate size and orientation, uneven illumination, background clutter, and nonplate objects. These complexities lead to poor performance using traditional LPL features, such as color, edge, and texture. Recently, state-of-the-art performance in LPL has been achieved by applying the scale invariant feature transform (SIFT) descriptor to LPL for visual matching. However, for applications that require fast processing, such as mobile phones, SIFT does not meet the efficiency requirement due to its relatively slow computational speed. To address this problem, a new approach for LPL, which uses the oriented FAST and rotated BRIEF (ORB) feature detector, is proposed. The feature extraction in ORB is much more efficient than in SIFT and is invariant to scale and grayscale as well as rotation changes, and hence is able to provide superior performance for LPL. The potential regions of a license plate are detected by considering spatial and color information simultaneously, which is different from previous approaches. The experimental results on a challenging dataset demonstrate the effectiveness and efficiency of the proposed method.
We present a new method for full-automatic calibration of traffic cameras using the end points on dashed lines. Our approach uses the improved RANSAC method with the help of pixels transverse projection to detect the dashed lines and end points on them. Then combining analysis of the geometric relationship between the camera and road coordinate systems, we construct a road model to fit the end points. Finally using two-dimension calibration method we can convert pixels in image to meters along the ground truth lane. On a large number of experiments exhibiting a variety of conditions, our approach performs well, achieving less than 5% error in measuring test lengths in all cases.
License plate localization (LPL) in open environment is quite challenging due to plate variations and environment variations. In this paper a new algorithm for license plate localization based on color pair and stroke width features of character is proposed. Four steps are mainly concerned in our algorithm. The image is first preprocessed by canny edge detector and color pair feature is extracted. And then edge pixels are clustered into several groups using by EM-based method. Further more, stroke width feature of edge pixels in each group are extracted to remove false groups and background outliers. Finally, LP candidates can be formed by morphological operation and prior knowledge of LP is used for verification and accurate location. We use a standard dataset including natural scene images with background noise, various observation views, changing illumination and various plate sizes for testing. The results show that the proposed algorithm achieves accuracy over 90% on localizing license plate and the processing time is 250ms in average for one image with size of 640*480.
Wide line detection plays an important role in image analysis and computer vision. However, most of the existing algorithms focus on the extraction of the line positions and length, ignoring line thickness and direction which can deepen our understanding of images. This paper presents a novel wide line detector using the ridge distribution feature and layer growth method. Unlike most existing edge and line detectors which use directional derivatives, our proposed method extracts the ridge target point and use the layer growth to find the line completely based on the isotropic nonlinear filter. Ridge points are detected by its distribution symmetry based on the isotropic responses via circular masks and orientation of the ridge is determined roughly. The ridge point is selected as a seed point, then growth layer by layer, to determine the width and orientation of the curvilinear structure accurately. Instead of point by point scanning, we label points in the growth region and adjust the scanning step adaptively which improve the method efficiently. The proposed method can detect the accurate width and direction of lines dynamically. This can provide great convenience for post-processing or for application requirements. A sequence of tests on a variety of image samples demonstrates that the proposed method outperforms state-of-the-art methods.
In this paper, a novel algorithm is proposed for segmentation of touching characters on the license plate. In our method, rough and precise segmentation of characters proceeds in sequence. At first, characters on the license plate are roughly classified as segmented and touching characters by vertical projection on the edge map. Next, segmentation points of touching characters are evaluated using fixed ratio relations of width, interval and height of characters. Finally, the touching characters are segmented by a path created by A* pathfinding algorithm. The proposed method is tested on 238 license plate images and the successful segmentation rate of 97.06% is achieved with only 62ms. And the experiment result demonstrates the effectiveness and efficient of our suggestion.
Accurate and fast curve detection in images is a challenging computer vision problem. HT(Hough transform) is one of
the most widely used techniques for curve detection. Existing HT-based methods have disadvantages of low accuracy
and low speed. In this paper, a new and efficient Hough Transform for curve detection is presented. In view of
kinematics, a curve can be regarded as movement trajectory of a given point, and point's velocity direction is the
tangential direction of point on the smooth curve. Thus the main contributions are threefold. 1) We formulate the
problem of curve detection as robustly fit curve in the connected region. 2) We propose the direction elements and
directional control scheme to quickly discover the smooth curve. 3) We use a coarse-to-fine strategy to efficiently detect
the final curve. We have tested our algorithm on simulated and natural image. Compared to other classical curve
detection methods, experimental results indicated that our algorithm reduces the time cost and improves the detection
accuracy greatly.
This paper presents a fast, precise method for segmenting white blood cell(WBC) based on visual salient features, which
is a two-stage algorithm consisting of adaptive WBC location based on salient map(AWLSM) by simulating the process
of human perception with bottom-up strategies and extract precise cell structure in cell salient attention window(CASW)
using parameter controlled adaptive salient mechanism (PCASM), the first step locates several CASWs in the blood cell
image and the second step is to extract nucleus and cytoplasm accurately in each CASW. The experimental results
demonstrate that the proposed method has sufficient accuracy and speediness to be used in the automatic blood analyzer.
The topology information of a grey picture can be carried by a series of discrete points with a different number of
points representing different dimensions of information. Our human brain can easily group these points and perceive
information with different dimensions. In this paper, we present how to use multi-scale discrete points to segment,
extrude and detect objects. Those points are divided into yin points and yang points, according to their position in a
digital picture. We also use a method based on peripheral direction contributions to group these points for the purpose of
round building detection and road detection. Primary experimental results show that our ways and means can be used in
several representative applications of object detection with a laudable performance.
KEYWORDS: Signal to noise ratio, Target detection, Optical engineering, Gaussian filters, Visual process modeling, Digital signal processing, Image enhancement, Performance modeling, Image analysis, Detection and tracking algorithms
In this paper, An approach for fast ship detection in infrared (IR) images based on multi-resolution attention mechanism is proposed. In order to realize real-time image analysis, attention mechanism is indispensable to focus the computational resources on only regions or information related to the task at hand. This paper discusses topics of: sampling model; index information generating or areas of interest (AOI) searching; next fixation point determination and target detection. Variance of the neighboring nodes in the periphery is used to form a saliency map of the image. Node with higher saliency is of greater possibility to be engine or other hot parts on a ship, while a straight line near below can confirm the ship hypothesis. In the paper, the sampling model is introduced, and then the 'index region' detection and the following saccade and analysis process are discussed. At the end of the paper, experimental results of the detection of ships with different size in infrared images are presented. Those demonstrate that our approach can find ship target effectively. Comparisons of performance with our approach and those of some other approaches are also presented.
Autonomous real-time fingerprint verification, how to judge whether two fingerprints come from the same finger or not, is an important and difficult problem in AFIS (Automated Fingerprint Identification system). In addition to the nonlinear deformation, two fingerprints from the same finger may also be dissimilar due to translation or rotation, all these factors do make the dissimilarities more great and lead to misjudgment, thus the correct verification rate highly depends on the deformation degree. In this paper, we present a new fast simple algorithm for fingerprint matching, derived from the Chang et al.'s method, to solve the problem of optimal matches between two fingerprints under nonlinear deformation. The proposed algorithm uses not only the feature points of fingerprints but also the multiple information of the ridge to reduce the computational complexity in fingerprint verification. Experiments with a number of fingerprint images have shown that this algorithm has higher efficiency than the existing of methods due to the reduced searching operations.
KEYWORDS: Nonuniformity corrections, Digital signal processing, Signal processing, Evolutionary algorithms, Control systems, Sensors, Image processing, Neural networks, Staring arrays, Digital image processing
All IRFPAs require nonuniformity correction. Although two- point or multi-point correction algorithms may correct the nonuniformity of IRFPAs they can be limited by pixel nonlinearities and instabilities. So adaptive nonuniformity correction techniques are needed. Many researchers develop the methods of real-time correction based the scene being viewed. The nonuniformity correction process is completed in IRFPA sensor named smart FPA. However, the smart IRFPA is developing. The purpose of this paper is to describe a digital signal processing electronics for the nonuniformity correction. It includes ADSP21060 digital signal processor, 8751 chip and display circuit module. The image data from IRFPA are put into the dual RAM. The ADSP21060 DSP completes the nonuniformity correction function while the 8751 chip operates control function. At the same time, the correction results will be displayed on a monitor. The neural networks algorithms and the constant-statistics algorithm are tested in our digital implementations. When the image size is 60*97, processing time per frame is 14.75 millisecond for the neural network algorithm and it is 12.65 millisecond for the constant-statistics algorithm. Measured results show that digital processing system designed by us may achieve demand of real-time nonuniformity correction based the scene for small IRFPAs.
Scene matching technique is one of the most basic and important techniques in the modern information processing domain. Scene matching is the space registration process of two images taken from the same scene by two different sensors so that their relative displacement is gotten.
In order to solve the slowly modified rate of 1 Hz of a GPS and the infinite position error of an INS, a continuously high accurate positioning algorithm of a linear convex combination of a linear-two-point and a quadratic-five-point polynomial to filter and predict GPS and INS positioning data is presented in this paper. The integration experiment of real GPS data and simulated INS data has shown the validity of this method. It provides a new approach of continuously high accurate real-time positioning under medium or highly dynamic environment.
In this paper the covering-blanket method widely used to estimate fractal dimension is improved. The D-dimensional area K, which has never been detailed in previous references, is clarified and further extended to a fractal signature as a function of scale and space. After defining two discrepancy measurements of multiscale fractal signature, an algorithm of man-made target detection, which is based on fractal signature change, is presented and tested on looking-forward infrared images collected by long wave infrared camera on sea surfaces. The results of using the D-dimensional area are compared with those of using the fractal dimension, and suggest that the proposed method performs better in detecting ship targets embedded in natural scenes.
When an airborne imagery sensor moves from far to near, for a non-zoom imaging system, the image sequence from a scene will vary with different scales dramatically, state-of-the-art methods based on a single invariant feature or a simple feature, such as moments invariant, shape specific points, topological features and Fourier descriptor, etc., are rendered useless for representing and recognizing a multiscale object in this specific image sequence. Even the image gray-pyramid technique, which has great potential for pattern recognition by template matching with different resolutions, can not provide satisfactory performance due to not knowing exactly the resolution of real images, so there is an increasing need for improvement in multiscale object rendering and recognition. In this paper, we develop a class of algorithms for representation and recognition of a multiscale object in the specific image sequence taken from a sensor moving from far to near, which is called hierarchy features model (HFM) and sequential object recognition algorithm (SORA) based on this hierarchy features model, respectively, and intended to represent a size-changing object and recognize it. Experimental results with many real visual and infrared images and simulated images have shown that when a non-zoom imagery sensor moves from far to near, the HFM is suitable to represent a multiscale object, and the SORA available to recognize it.
In this paper, we present morphological processing using median operation for small object detection. First, we perform median morphological operation on the gray-scale image with structuring element A and make all scene regions of size equal to the central A's area or larger brighter (for bright objects) or darker (for dark objects) and other regions approximate invariably. Second, we perform median morphological operation on the gray-scale image with larger structuring element B and make all scene regions of size equal to the central B's area or smaller darker (for bright objects) or brighter (for dark objects) and other regions approximate invariably. Third, we calculate the absolute difference of above two outputs. All object regions between the smallest and largest will be enhanced and all background regions will be weakened. Then a simple threshold can extract all objects with some smaller background regions. Finally, those smaller background regions whose areas are smaller than structuring element A can be eliminated by region labeling processing. We find that if (1) contrary to background, the object regions have the signature of discontinuity with their neighbor regions. (2) Each object concentrates relatively in a small region, which can be considered as a homogeneous compact region, our algorithm can achieve satisfactory detection performance.
KEYWORDS: Signal to noise ratio, Detection and tracking algorithms, Target detection, 3D acquisition, Digital filtering, Image processing, Electronics, Fourier transforms, Image filtering, Optical filters
The movement model of target limited by track-before-detect approaches is extended to that of target with constant acceleration in this paper. And an algorithm based on linear-variant-coefficient-difference-equation for moving target indication is proposed. Moreover, based on parametric models of target and background, this paper presents an analysis of its optimal SNR gain versus target and background characteristics as well as the sensitivity of this gain to mismatch.
A method based on quasi-continuity-filter for the detection of moving pixel-sized target under low SNR is presented in this paper. First each frame of the input sequence is binarized based on maximum error probability rule. And then the quasi-continuity-filter is designed to use the continuity property of the target pixels in the adjacent frames and randomness of noise pixels to filter out the noise.
This paper discusses the problem of radar to optical scene matching. Because of the different imaging principles, there exists poor similarity between radar images and optical images, and for region features, only exist the relative stable common characteristics: large scale regions. This paper presents a fast and effective matching method to reach the demand of the fast orientation, which makes use of region segmentation technique to extract large scale regions in a radar image, and recognizes the large scale object in it according to the knowledge of the object region in the corresponding optical image to perform coarse location, finally using an additional template matching processing to perform fine location.
A method that makes the Hopfield neural network perform the point pattern relaxation matching process is proposed. An advantage of this is that the relaxation matching process can be performed in real time with the massively parallel capability to process information of the neural network. Experimental results with large simulated images prove the effectiveness and feasibility to perform point relaxation matching by the Hopfield neural network.
KEYWORDS: Target detection, Signal to noise ratio, Optical engineering, Detection and tracking algorithms, 3D acquisition, 3D image processing, Fractal analysis, Signal detection, 3D displays, Object recognition
Contrast based texture segmentation method is presented in this paper. When defining the measurements such as contrast and homogeneity of regions the visual characteristics are considered. When calculating the measurements some fast algorithms are used. Some images and their segmented results are also provided.
According to the principle of human discrimination of a small object from a natural scene in which there is the signature of discontinuity between the object and its neighbor regions, we develop an efficient algorithm for small object detection based on template matching by using a dissimilarity measure that is called average gray absolute difference maximum map (AGADMM), infer the criterion of recognizing a small object from the properties of the AGADMM of a natural scene that is a spatially independent and stable Gaussian random field, explain how the AGADMM improves the detectable probability and keeps the false alarm probability very low, analyze the complexity of computing AGADMM, and justify the validity and efficiency. Experiments with visual images of a natural scene such as sky and sea surface have shown the great potentials of the proposed method for distinguishing a small man-made object from natural scenes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.