In real life, the emergence of haze brings great inconvenience and does harm to traffic and pedestrian safety, whereas previous studies paid less attention to text detection in haze scenes. In this experiment, we found that the candidates obtained by the general non-maximum suppression (NMS) method or the soft-NMS method are difficult to precisely match the ground truth, and the incomplete feature extraction will affect the final performance. In this work, a haze scene text detection framework is skillfully designed. An optimized NMS and an optimized long short-term memory for spatial feature extraction and temporal feature extraction are proposed to improve the text detection performance. In addition, a hazing scene text dataset (named HSText-1000) and a hybrid scenario text dataset (named MHSText-4600) have been built in our work for evaluating the performance conveniently, which have been released and can be downloaded from https://github.com/lyy0117/lyy. Experimental results illustrate that our method is superior to some state-of-the-art methods in the hazing scene and the hybrid scene. Meanwhile, we achieved competitive results in nonhaze’s public dataset (ICDAR 2013), which means that our method has satisfactory adaptability. We will release code to facilitate community research.
Recently, lots of works try to capture contextual information to benefit semantic segmentation problems. However, most approaches adopt the uniform method to obtain context information, which means each pixel gets its context from the same region. We argue that for each pixel, contextual information aggregated from the region it belongs to can benefit the dense prediction, while those from other irrelevant regions possibly mislead the prediction. In this work, we propose a Region Context Module (RCM) that aggregates context for each pixel only from its object region. Furthermore, we design a Region Context Network (RCNet) embedded in the ASPP Module and Region Context Module. We conduct experiments on three datasets: Cityscapes, Vaihingen and Potsdam datasets. Extensive quantitative and qualitative evaluations demonstrate our model achieves favorable performance against state-of-the-art approaches.
KEYWORDS: Data modeling, Convolutional neural networks, Computer programming, Information technology, Computing systems, Evolutionary algorithms, Information theory
Vehicle part recognition aims to determine the subcategories of each vehicle part. Existing algorithms consider to recognize each category as independent classification tasks, which ignore the potential co-occurrence relationship between vehicle parts. In addition, it remains challenges to obtain satisfactory results due to the small intra- class difference. In this paper, we propose a part-pair recognition method based on deep learning by utilizing the co-occurrence relationship. Specifically, we construct a deep neural network for vehicle part recognition, which can use the co-occurrence relationship and recognize two vehicle part simultaneously. We also propose a massive dataset of vehicle parts with fully annotated labels for training and testing. Extensive experimental results demonstrate that the proposed method performs favorably against the state-of-the-art vehicle recognition algorithms.
In this paper, we propose a learning method for deblurring Gaussian blurred images blindly by exploiting edge cues via deep multi-scales generative adversarial network: DeepEdgeGAN. We proposed the edges of the blurred images to be incorporated with the blurred image as the input of the DeepEdgeGAN to provide a strong prior constraint for the restoration, which is beneficial to solve the problem that gradients of the restored images with GANs methods tend to be smooth and not clear enough. Further, we introduce the perceptual and edge as well as scale losses to train the DeepEdgeGAN. With the trained end-to-end model, we directly restore the latent sharp images from blurred images and avoiding the estimation of pixel-kernel. Qualitative and quantitative experiments demonstrate that the visual effect of the restored images significantly improves better.
This paper proposes a 3D pose estimation method for week texture objects, by performing point matching of a test image to a matched rendering image of an object rather than its 3D model. Give a 3D model of an object, we use an exemplar based 2D-3D matching method to estimate the coarse pose of the object. We first obtain the 2D rendering images of each view of the object using its 3D model, and build an exemplar based model using all the rendering images. For a test image, we then perform 2D-3D matching using the proposed model, and the rendering image with the highest score is the best match to the test image. The coarse pose can be obtained using the view parameters of the rending images. Finally, we perform point matching between the matched rendering image and the test image to estimate pose more accurately. The proposed coarse-to- fine pose estimation method can provide stronger constraint, which makes pose estimation more accurate. The experimental results demonstrate the effectiveness of the proposed method.
Vehicle parts detection plays an important role in public transportation safety and mobility. The detection of vehicle parts is to detect the position of each vehicle part. We propose a new approach by combining Faster RCNN and three level cascaded convolutional neural network (DCNN). The output of Faster RCNN is a series of bounding boxes with coordinate information, from which we can locate vehicle parts. DCNN can precisely predict feature point position, which is the center of vehicle part. We design an output strategy by combining these two results. There are two advantages for this. The quality of the bounding boxes are greatly improved, which means vehicle parts feature point position can be located more precise. Meanwhile we preserve the position relationship between vehicle parts and effectively improve the validity and reliability of the result. By using our algorithm, the performance of the vehicle parts detection improve obviously compared with Faster RCNN.
KEYWORDS: Image processing, Image restoration, Image analysis, Process modeling, Convolution, Machine learning, Lithium, Fluctuations and noise, Information technology, Data processing
Image deblurring is to estimate the blur kernel and to restore the latent image. It is usually divided into two stage, including kernel estimation and image restoration. In kernel estimation, selecting a good region that contains structure information is helpful to the accuracy of estimated kernel. Good region to deblur is usually expert-chosen or in a trial-anderror way. In this paper, we apply a metric named relative total variation (RTV) to discriminate the structure regions from smooth and texture. Given a blurry image, we first calculate the RTV of each pixel to determine whether it is the pixel in structure region, after which, we sample the image in an overlapping way. At last, the sampled region that contains the most structure pixels is the best region to deblur. Both qualitative and quantitative experiments show that our proposed method can help to estimate the kernel accurately.
As an important information carrier, texts play significant roles in many applications. However, text detection in unconstrained scenes is a challenging problem due to cluttered backgrounds, various appearances, uneven illumination, etc.. In this paper, an approach based on multi-channel information and local context is proposed to detect texts in natural scenes. According to character candidate detection plays a vital role in text detection system, Maximally Stable Extremal Regions(MSERs) and Graph-cut based method are integrated to obtain the character candidates by leveraging the multi-channel image information. A cascaded false positive elimination mechanism are constructed from the perspective of the character and the text line respectively. Since the local context information is very valuable for us, these information is utilized to retrieve the missing characters for boosting the text detection performance. Experimental results on two benchmark datasets, i.e., the ICDAR 2011 dataset and the ICDAR 2013 dataset, demonstrate that the proposed method have achieved the state-of-the-art performance.
Detection and recognition of vehicles are two essential tasks in intelligent transportation system (ITS). Currently, a prevalent method is to detect vehicle body, logo or license plate at first, and then recognize them. So the detection task is the most basic, but also the most important work. Besides the logo and license plate, some other parts, such as vehicle face, lamp, windshield and rearview mirror, are also key parts which can reflect the characteristics of vehicle and be used to improve the accuracy of recognition task. In this paper, the detection of vehicle parts is studied, and the work is novel. We choose Faster R-CNN as the basic algorithm, and take the local area of an image where vehicle body locates as input, then can get multiple bounding boxes with their own scores. If the box with maximum score is chosen as final result directly, it is often not the best one, especially for small objects. This paper presents a method which corrects original score with relative position information between two parts. Then we choose the box with maximum comprehensive score as the final result. Compared with original output strategy, the proposed method performs better.
Person reidentification (re-id) aims to match a specified person across non-overlapping cameras, which remains a very challenging problem. While previous methods mostly focus on feature extraction or metric learning, this paper makes the attempt in jointly learning both the global full-body and local body-parts features of the input persons with a multichannel convolutional neural network (CNN) model, which is trained by an adaptive triplet loss function that serves to minimize the distance between the same person and maximize the distance between different persons. The experimental results show that our approach achieves very promising results on the large-scale Market-1501 and DukeMTMC-reID datasets.
The Log-Gabor transform, which is suitable for analyzing gradually changing data such as in iris and face images, has been widely used in image processing, pattern recognition, and computer vision. In most cases, only the magnitude or phase information of the Log-Gabor transform is considered. However, the complementary effect taken by combining magnitude and phase information simultaneously for an image-feature extraction problem has not been systematically explored in the existing works. We propose a local image descriptor for face recognition, called Log-Gabor Weber descriptor (LGWD). The novelty of our LGWD is twofold: (1) to fully utilize the information from the magnitude or phase feature of multiscale and orientation Log-Gabor transform, we apply the Weber local binary pattern operator to each transform response. (2) The encoded Log-Gabor magnitude and phase information are fused at the feature level by utilizing kernel canonical correlation analysis strategy, considering that feature level information fusion is effective when the modalities are correlated. Experimental results on the AR, Extended Yale B, and UMIST face databases, compared with those available from recent experiments reported in the literature, show that our descriptor yields a better performance than state-of-the art methods.
We propose an approach for textured image segmentation based on amplitude-modulation frequency-modulation models. An image is modeled as a set of 2-D nonstationary sinusoids with spatially varying amplitudes and spatially varying frequency vectors. First, the demodulation procedure for the models furnishes a high-dimensional output at each pixel. Then, features including texture contrast, scale, and brightness are elaborately selected based on the high-dimensional output and the image itself. Next, a normalization and weighting scheme for feature combination is presented. Finally, simple K-means clustering is utilized for segmentation. The main characteristic of this work provides a feature vector that strengthens useful information and has fewer dimensionalities simultaneously. The proposed approach is compared with the dominant component analysis (DCA)+K-means algorithm and the DCA+ weighted curve evolution algorithm on three different datasets. The experimental results demonstrate that the proposed approach outperforms the others.
Conventional object-detection and localization approaches require extensive time to process the sliding background windows, which are not like the object at all. Global context of the subwindow gives access to alleviate such problems. In addition, many patch-based approaches often fail to search the patches at the correct locations and local context can help to resolve that. We propose an object-detection framework, which is top down and simple to implement. It combines global contextual features, local contextual features, and local appearance features in a coarse-to-fine cascade, which enables fast detection. The three features mentioned above play different roles in the process of object detection, and the representation with rich information makes detection robust and effective. The proposed approach shows satisfactory performance in both speed and accuracy.
We develop a novel approach for object detection and location task. This paper proposed a novel method to represent
local regions around keypoints, called lifetime. Lifetime of a keypoint is used to describe its stability. Together with
geometric relationships extractor, lifetime representations are embedded into a bag-of-features framework. The
framework has following properties. First, the keypoints are represented as the lifetime rather than vector-quantized.
Second, a simple and computationally efficient spatial pyramid structure is used to extract the geometric relationships
between the keypoints. We demonstrate the efficacy of the proposed approach on UIUC car dataset. The experimental
results showed that our approach has an excellent performance for object detection and localization.
Creating a visual codebook is an important problem in object recognition. Using a compact visual codebook can boost computational efficiency and reduce memory cost. A simple and effective method is proposed for visual feature codebook construction. On the basis of a feedforward hierarchical model, a robust local descriptor is proposed and an a priori statistical scheme is applied to the class-specific feature-learning stage. The experiments show that the proposed approach achieves reliable performance with shorter codebook length, and incremental learning can be easily enabled.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.