The role of barcodes in industrial processes and product labeling is rapidly growing. So, the problem of their recognition using mobile cameras in uncontrolled environments becomes very acute. The development of neural networks and computer vision provides an opportunity to solve it with high accuracy, given that there is enough representative data to train and test algorithms. However, the barcodes can contain sensitive information, and huge amounts of data cannot be published. Synthetic image generation can solve this problem, and in this paper, we propose a method to generate semi-synthetic natural-looking 2D barcodes with illumination changes, blur, and projective distortion, which can be used to create training or testing data for localization problems. We introduce the SE-DMTX-SYN-1000 dataset, designed for Data Matrix fine localization. We validate localization accuracy using Zxing, Zxing-cpp and libdmtx barcode reading libraries and demonstrate that this benchmark is quite challenging and can help to improve Data Matrix localization.
KEYWORDS: Binary data, Education and training, Detection and tracking algorithms, Image processing algorithms and systems, Image classification, Distance measurement, Digital imaging, Data processing, Time metrology
We present FARA, a novel approach for fast approximation of RFD-like descriptors in the context of document retrieval systems. RFD-like descriptors are widely used for document representation, but their computation is expensive, especially for large document collections. Our method is a CPU-friendly gradient maps computation approximation with sequential memory access and integer-only calculations. There are three types of operations that we use: addition, subtraction, and absolute values. It allows us to effectively use SIMD extensions, resulting in an additional increase in the running speed. Experimental results demonstrate that FARA achieves the same accuracy as RFDoc descriptors and significantly reduces the computational overhead. The proposed approach achieves a twofold speed improvement of gradient maps computation and 25% acceleration of overall descriptor computing time compared to the most efficient RFDoc implementation.
In the paper, we present a quantization method for bipolar morphological neural networks. Bipolar morphological neural networks use only addition, subtraction, and maximum operations inside the neuron and exponent and logarithm as activation functions of the layers. These operations allow fast and compact gate implementation for FPGA and ASIC, which makes these networks a promising solution for embedded devices. Quantization allows us to reach an additional increase in computational efficiency and reduce the complexity of hardware implementation by using integer values of low bitwidth for computations. We propose an 8-bit quantization scheme based on integer maximum, addition, and lookup tables for non-linear functions and experimentally demonstrate that basic models for image classification can be quantized without noticeable accuracy loss. More advanced models still provide high recognition accuracy but would benefit from further fine-tuning.
Deep neural networks are widely used in various AI systems. Many such systems rely on the edge computing concept and try to perform computations on end devices while still being energy and memory efficient. Therefore, substantial time and memory requirements are imposed on neural networks. One way to improve neural network efficiency is to simplify computations inside a neuron. A bipolar morphological neuron uses only addition, subtraction, and maximum operations inside the neuron and exponent and logarithm as activation functions for the network layers. These operations allow fast and compact gate implementation for FPGA and ASIC. In the paper, we consider the usage of bipolar morphological (BM) networks for document binarization. We examine the DIBCO 2017 binarization challenge and train the bipolar morphological convolutional neural network of U-Net architecture. Despite some accuracy decrease for a model with all BM convolutional layers, one can flexibly control the accuracy by using the partially converted model. It should be noted that even the fully BM model is suitable for solving the problem in practice.
The implementations of the convolution operation in neural networks are usually based on convolution-to-GeMM (General Matrix Multiplication) transformation. However, this transformation requires a big intermediate buffer (called im2col or im2row), and its initialization is both memory and time-consuming. To overcome this problem, one may use the Indirect Convolution Algorithm. This algorithm replaces the im2row buffer with a much smaller buffer of pointers, called indirection buffer. However, it limits our flexibility in the choice of multiplication micro-kernel, making matrix multiplication slightly less efficient than in the classical GeMM algorithm. To overcome this problem, we propose the Almost Indirect Convolution Algorithm, which initializes small specifically ordered block of values, which is used in matrix multiplication, via indirection buffer, the same way GeMM Algorithms initializes one block from im2row buffer. Our approach allows us to combine computational efficiency and flexibility in shape of GeMM micro-kernels with a small memory footprint of the Indirect Convolution Algorithm. Experiments with convolutions of 8-bit matrices on ARM processors show that our convolution works 14-24% faster than Indirect for a small number of channels and 10-20% faster than classical GeMM-based. This proves that it is perfectly suitable for computing inference of 8-bit quantized networks on mobile devices.
In this work we apply commonly known methods of non-adaptive interpolation (nearest pixel, bilinear, B-spline, bicubic, Hermite spline) and sampling (point sampling, supersampling, mip-map pre-filtering, rip-map pre-filtering and FAST) to the problem of projective image transformation. We compare their computational complexity, describe their artifacts and than experimentally measure their quality and working time on mobile processor with ARM architecture. Those methods were widely developed in the 90s and early 2000s, but were not in an area of active research in resent years due to a lower need in computationally efficient algorithms. However, real-time mobile recognition systems, which collect more and more attention, do not only require fast projective transform methods, but also demand high quality images without artifacts. As a result, in this work we choose methods appropriate for those systems, which allow to avoid artifacts, while preserving low computational complexity. Based on the experimental results for our setting they are bilinear interpolation combined with either mip-map pre-filtering or FAST sampling, but could be modified for specific use cases.
In the paper we introduce a novel bipolar morphological neuron and bipolar morphological layer models. The models use only such operations as addition, subtraction and maximum inside the neuron and exponent and logarithm as activation functions for the layer. The proposed models unlike previously introduced morphological neural networks approximate the classical computations and show better recognition results. We also propose layer-by-layer approach to train the bipolar morphological networks, which can be further developed to an incremental approach for separate neurons to get higher accuracy. Both these approaches do not require special training algorithms and can use a variety of gradient descent methods. To demonstrate efficiency of the proposed model we consider classical convolutional neural networks and convert the pre-trained convolutional layers to the bipolar morphological layers. Seeing that the experiments on recognition of MNIST and MRZ symbols show only moderate decrease of accuracy after conversion and training, bipolar neuron model can provide faster inference and be very useful in mobile and embedded systems.
In the paper we consider computational optimization of recognition system on Very Long Instruction Word architecture. Such architecture is aimed to a broad parallel execution and low energy consumption. We discuss VLIW features on the example of Elbrus-based computational platform. In the paper we consider system for 2D art recognition as the example. This system is able to identify a painting on acquired image as a painting from the database, using local image features constructed from YACIPE-keypoints and their RFD-based binary color descriptors, created as a concatenation of RFD-like descriptors for each channel. They are computed fast, while the 2D art database is quite large, so in our case more than a half of execution time consumes descriptor comparison using Hamming distance during image matching. This operation can be optimized with the help of low-level optimization considering special architecture features. In the paper we show efficient usage of intrinsic functions for Elbrus-4C processor and memory access with array prefetch buffer, which is specific for Elbrus platform. We demonstrate the speedup up to 11.5 times for large arrays and about 1.5 times overall speedup for the system without any changes in intermediate computations.
In this paper we describe stitching protocol, which allows to obtain high resolution images of long length monochromatic objects with periodic structure. This protocol can be used for long length documents or human-induced objects in satellite images of uninhabitable regions like Arctic regions. The length of such objects can reach notable values, while modern camera sensors have limited resolution and are not able to provide good enough image of the whole object for further processing, e.g. using in OCR system. The idea of the proposed method is to acquire a video stream containing full object in high resolution and use image stitching. We expect the scanned object to have straight boundaries and periodic structure, which allow us to introduce regularization to the stitching problem and adapt algorithm for limited computational power of mobile and embedded CPUs. With the help of detected boundaries and structure we estimate homography between frames and use this information to reduce complexity of stitching. We demonstrate our algorithm on mobile device and show image processing speed of 2 fps on Samsung Exynos 5422 processor
An iterative algorithm is proposed for blind multi-image deblurring of binary images. The binarity is the only prior restriction imposed on the image. Image formation model assumes convolution with arbitrary kernel and addition of a constant value. Penalty functional is composed using binarity constraint for regularization. The algorithm estimates the original image and distortion parameters by alternate reduction of two parts of this functional. Experimental results for natural (non-synthetic) data are present.
In this paper, we introduce slant detection method based on Fast Hough Transform calculation and demonstrate its application in industrial system for Russian passports recognition. About 1.5% of this kind of documents appear to be slant or italic. This fact reduces recognition rate, because Optical Recognition Systems are normally designed to process normal fonts. Our method uses Fast Hough Transform to analyse vertical strokes of characters extracted with the help of x-derivative of a text line image. To improve the quality of detector we also introduce field grouping rules. The resulting algorithm allowed to reach high detection quality. Almost all errors of considered approach happen on passports of nonstandard fonts, while slant detector works in appropriate way.
In this paper, we propose an expansion of convolutional neural network (CNN) input features based on Hough Transform. We perform morphological contrasting of source image followed by Hough Transform, and then use it as input for some convolutional filters. Thus, CNNs computational complexity and the number of units are not affected. Morphological contrasting and Hough Transform are the only additional computational expenses of introduced CNN input features expansion. Proposed approach was demonstrated on the example of CNN with very simple structure. We considered two image recognition problems, that were object classification on CIFAR-10 and printed character recognition on private dataset with symbols taken from Russian passports. Our approach allowed to reach noticeable accuracy improvement without taking much computational effort, which can be extremely important in industrial recognition systems or difficult problems utilising CNNs, like pressure ridge analysis and classification.
This paper explores method of layer-by-layer training for neural networks to train neural network, that use approximate calculations and/or low precision data types. Proposed method allows to improve recognition accuracy using standard training algorithms and tools. At the same time, it allows to speed up neural network calculations using fast-processed approximate calculations and compact data types. We consider 8-bit fixed-point arithmetic as the example of such approximation for image recognition problems. In the end, we show significant accuracy increase for considered approximation along with processing speedup.
Computing image patch descriptors for correspondence problems relies heavily on hand-crafted feature transformations, e.g. SIFT, SURF. In this paper, we explore a Siamese pairing of fully connected neural networks for the purpose of learning discriminative local feature descriptors. Resulting ANN computes 128-D descriptors, and demonstrates consistent speedup as compared to such state-of-the-art methods as SIFT and FREAK on PCs as well as in embedded systems. We use L2 distance to reflect descriptor similarity during both training and testing. In this way, feature descriptors we propose can be easily compared to their hand-crafted counterparts. We also created a dataset augmented with synthetic data for learning local features, and it is available online. The augmentations provide training data for our descriptors to generalise well against scaling and rotation, shift, Gaussian noise, and illumination changes.
Neural network calculations for the image recognition problems can be very time consuming. In this paper we propose three methods of increasing neural network performance on SIMD architectures. The usage of SIMD extensions is a way to speed up neural network processing available for a number of modern CPUs. In our experiments, we use ARM NEON as SIMD architecture example. The first method deals with half float data type for matrix computations. The second method describes fixed-point data type for the same purpose. The third method considers vectorized activation functions implementation. For each method we set up a series of experiments for convolutional and fully connected networks designed for image recognition task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.