The amount of semantic contribution of each object in an image to understand the scene is different. Understanding the relative importance of objects in a scene for scene semantics is critical for various computer vision applications, such as scene recognition and image captioning. In this paper, we refer to the contribution of an object to scene semantics as the degree of its gist and propose a method for Estimating the degree of the Gist of an Instance (EGoI). In the EGoI method, an object gist degree is estimated by the semantic features comparison strategy and the semantic distance comparison strategy. In the first strategy, the image is represented as a scene graph first, then the aggregation features and the node features of different graph node combinations are calculated to estimate the instance gist that is not in the combination of the nodes. In the second strategy, the captions of the complete and incomplete images are generated, then the semantic distance of these captions is used to estimate the instance gist deleted in the scene. Different strategies for estimating the gist degree of instances are tested in the experiments. The results show that the proposed method can effectively quantify the contribution of an instance to scene semantics. Among these strategies, the method that compares semantic features has better discriminative power for the gist degree of various instances in the scene.
Building 3D local surface feature with a local reference frame (LRF) can obtain rotational invariance and make use of 3D spatial information, thereby boosting the distinctiveness of a 3D local feature. However, this situation is based on the assumption that the LRF is stable and repeatable. Owing to the disturbances like noise, point density variation, occlusion and clutter, LRF may suffer ambiguity so that limit the ability of a LRF-based 3D local feature. This paper presents an efficient method for LRF construction. The experimental results show the superior performance of our proposed LRF in terms of repeatability and robustness on several popular datasets by comparing with the state-of-the-art methods. Moreover, our method is computational efficient as well.
This paper presents an efficient 3D correspondence grouping algorithm for finding inliers from an initial set of feature matches. The novelty of our approach lies in the use of a combination of pair-wise and triple-wise geometric constraints to filter the outliers from the initial correspondence. The triple-wise geometric constraint is built by considering three pairs of corresponding points simultaneously. A global reference point generated according to the model shape can be mapped to the scene shape thereby form a derived point by the triple-wise geometric constraint. Then, all the initial correspondence can be filtered once via the global reference point and the derived point by using some simple and low-level geometric constraints. Afterwards, the remaining correspondences will be further filtered by means of the pair-wise geometric consistency algorithm. Finally, more accurate matching results can be obtained. The experimental results show the superior performance of our approach with respect to the noise, point density variation and partial overlap. Our algorithm strikes a good balance between accuracy and speed.
Currently, feature-based visual Simultaneous Localization and Mapping (SLAM) has reached a mature stage. Feature-based visual SLAM systems usually calculate the camera poses without producing a dense surface, even if a depth camera are provided. In contrast, dense SLAM systems simultaneously output camera poses as well as a dense surface of the reconstruction region. In this paper, we propose a new RGB-D dense SLAM system. First, camera pose is calculated by minimizing the combination of the reprojection error and the dense geometric error. We construct a new type of edge in g2o, which adds the extra constraints built with the dense geometric error to the graph optimization. The cost function is minimized in a coarse-to-fine strategy with GPU which contributes to the enhancement of system frame rate and promotion of large camera motion convergence. Second, in order to generate dense surfaces and provide users with a feedback of the scanned surfaces, we use the surfel model to fuse RGB-D streams and generated dense surface models in real-time. The surfels in the dense model are updated with embedded deformation graph to keep them consistent with the optimized camera poses after the system performs essential graph optimization and full Bundle Adjustment (BA). Third, a better 3D model is achieved by re-merging the stream with the optimized camera poses when the user ends the reconstruction. We compare the accuracy of generated camera trajectories and reconstruction surfaces with the state-of-the-art systems based on the TUM and ICL-NIUM RGB-D benchmark datasets. Experimental results show that the accuracy of dense surfaces produced online is very close to that of later re-fusion. And our system produces better results than the state-of-the-art systems in terms of the accuracy of the produced camera trajectories.
Simultaneous Localization and Mapping (SLAM) plays an important role in navigation and augmented reality (AR) systems. While feature-based visual SLAM has reached a pre-mature stage, RGB-D-based dense SLAM becomes popular since the birth of consumer RGB-D cameras. Different with the feature-based visual SLAM systems, RGB-D-based dense SLAM systems, for example, KinectFusion, calculate camera poses by registering the current frame with the images raycasted from the global model and produce a dense surface by fusing the RGB-D stream. In this paper, we propose a novel reconstruction system. Our system is built on ORB-SLAM2. To generate the dense surface in real-time, we first propose to use truncated signed distance function (TSDF) to fuse the RGB-D frames. Because camera tracking drift is inevitable, it is unwise to represent the entire reconstruction space with a TSDF model or utilize the voxel hashing approach to represent the entire measured surface. We use moving volume proposed in Kintinuous to represent the reconstruction region around the current frame frustum. Different with Kintinuous which corrects the points with embedded deformation graph after pose graph optimization, we re-fuse the images with the optimized camera poses and produce the dense surface again after the user ends the scanning. Second, we use the reconstructed dense map to filter out the outliers of the features in the sparse feature map. The depth maps of the keyframes are raycasted from the TSDF volume according to the camera pose. The feature points in the local map are projected into the nearest keyframe. If the discrepancy between depth values of the feature and the corresponding point in the depth map exceeds the threshold, the feature is considered as an outlier and removed from the feature map. The discrepancy value is also combined with feature pyramid layer to calculate the information matrix when minimizing the reprojection error. The features in the sparse map reconstructed near the produced dense surface will impose large influence in camera tracking. We compare the accuracy of the produced camera trajectories as well as the 3D models to the state-of-the-art systems on the TUM and ICL-NIUM RGB-D benchmark datasets. Experimental results show our system achieves state-of-the-art results.
The detection of ellipses in digital image data is an important task in vision-based systems, since elliptical shapes are very common in nature and in man-made objects. Ellipse detection in real images is technically a very challenging problem in detection effectiveness and execution time. We propose an improved ellipse detection method for real-time performance on real world images. We extract arcs from the edge mask and classify them in four classes according to edge direction and convexity. By developing arc selection strategy, we select a combination of arcs possibly belonging to the same ellipse, and then estimate its parameters via the least squares fitting technique. Candidate ellipses are validated according to the fitness of the estimation with the actual edge pixels. Our method has been tested on three real images datasets and compared with two state-of-the-art methods. Our method performs superior than the compared methods. The results also show that the proposed method is suitable for real-time applications.
Point cloud registration is a fundamental task in high level three dimensional applications. Noise, uneven point density and varying point cloud resolutions are the three main challenges for point cloud registration. In this paper, we design a robust and compact local surface descriptor called Local Surface Angles Histogram (LSAH) and propose an effectively coarse to fine algorithm for point cloud registration. The LSAH descriptor is formed by concatenating five normalized sub-histograms into one histogram. The five sub-histograms are created by accumulating a different type of angle from a local surface patch respectively. The experimental results show that our LSAH is more robust to uneven point density and point cloud resolutions than four state-of-the-art local descriptors in terms of feature matching. Moreover, we tested our LSAH based coarse to fine algorithm for point cloud registration. The experimental results demonstrate that our algorithm is robust and efficient as well.
We present an extensible local feature descriptor that can encode both geometric and photometric information. We first construct a unique and stable local reference frame (LRF) using the sphere neighboring points of a feature point. Then, all the neighboring points are transformed with the LRF to keep invariance to transformations. The sphere neighboring region is divided into several sphere shells. In each sphere shell, we calculate the cosine values of the point with the x-axis and z-axis. These two values are then mapped into two one-dimensional (1-D) histograms, respectively. Finally, all of the 1-D histograms are concatenated to form the signature of position angles histogram (SPAH) feature. The SPAH feature can easily be extended to a color SPAH (CSPAH) by adding another 1-D histogram generated by the photometric information of each point in each shell. The SPAH and CSPAH were rigorously tested on several common datasets. The experimental results show that both feature descriptors were highly descriptive and robust under Gaussian noise and varying mesh decimations. Moreover, we tested our SPAH- and CSPAH-based three-dimensional object recognition algorithms on four standard datasets. The experimental results show that our algorithms outperformed the state-of-the-art algorithms on these datasets.
This paper presents a robust and rotation invariant local surface descriptor by encoding the position angles of neighboring points with a stable and unique local reference frame (LRF) into a 1D histogram. The whole procedure includes two stages: the first stage is to construct a unique LRF by performing eigenvalue decomposition on the covariance matrix formed using all the neighboring points on the local surface. On the second stage, the sphere support field of a key point was divided along the radius into several sphere shells which is similar with the Signature of Histograms OrienTations (SHOT). In each sphere shell, we calculate the cosine of the angles between the neighboring points and the x-axis, z-axis respectively to form two 1D histograms. Finally, all the 1D histograms were stitched together followed by a normalization to generate the local surface descriptor. Experiment results show that our proposed local feature descriptor is robust to noise and varying mesh-resolutions. Moreover, our local feature descriptor based 3D object recognition algorithm achieved a high average recognition rate of 98.9% on the whole UWA dataset.
Gas pressure sensor based on an antiresonant reflecting guidance mechanism in a hollow-core fiber (HCF) with an open microchannel is experimentally demonstrated. The microchannel is created on the ring cladding of the HCF by femtosecond laser to provide an aircore pressure equivalent to the external pressure. The HCF cladding functions as an antiresonant reflecting waveguide, which induces sharp periodic losses in its transmission spectrum. The proposed sensor is miniature, robust, and exhibits a high pressure sensitivity of 3.592 nm/MPa, a low temperature cross-sensitivity of 7.5 kPa/°C.
KEYWORDS: Cameras, Detection and tracking algorithms, Imaging systems, 3D modeling, Optical tracking, Databases, Systems modeling, Performance modeling, Optical engineering, RGB color model
We present an approach for real-time camera tracking with depth stream. Existing methods are prone to drift in sceneries without sufficient geometric information. First, we propose a new weight method for an iterative closest point algorithm commonly used in real-time dense mapping and tracking systems. By detecting uncertainty in pose and increasing weight of points that constrain unstable transformations, our system achieves accurate and robust trajectory estimation results. Our pipeline can be fully parallelized with GPU and incorporated into the current real-time depth camera tracking system seamlessly. Second, we compare the state-of-the-art weight algorithms and propose a weight degradation algorithm according to the measurement characteristics of a consumer depth camera. Third, we use Nvidia Kepler Shuffle instructions during warp and block reduction to improve the efficiency of our system. Results on the public TUM RGB-D database benchmark demonstrate that our camera tracking system achieves state-of-the-art results both in accuracy and efficiency.
KEYWORDS: 3D modeling, Object recognition, Laser range finders, 3D image processing, Detection and tracking algorithms, Data modeling, Optical engineering, Statistical modeling, Image resolution, Instrument modeling
This paper presents a highly distinctive and robust local three-dimensional (3-D) feature descriptor named longitude and latitude spin image (LLSI). The whole procedure has two modules: local reference frame (LRF) definition and LLSI feature description. We employ the same technique as Tombari to define the LRF. The LLSI feature descriptor is obtained by stitching the longitude and latitude (LL) image to the original spin image vertically, where the LL image was generated similarly with the spin image by mapping a two-tuple (θ,φ) into a discrete two-dimensional histogram. The performance of the proposed LLSI descriptor was rigorously tested on a number of popular and publicly available datasets. The results showed that our method is more robust with respect to noise and varying mesh resolution than existing techniques. Finally, we tested our LLSI-based algorithm for 3-D object recognition on two popular datasets. Our LLSI-based algorithm achieved recognition rates of 100%, 98.2%, and 96.2%, respectively, when tested on the Bologna, University of Western Australia (UWA) (up to 84% occlusion), UWA datasets (all). Moreover, our LLSI-based algorithm achieved 100% recognition rate on the whole UWA dataset when generating the LLSI descriptor with the LRF proposed by Guo.
In order to enable the non-cooperative rendezvous, capture, and removal of large space debris, robust and fast tracking of the non-cooperative target is needed. This paper proposes an improved algorithm of real-time visual tracking for space non-cooperative target based on three-dimensional model, and it does not require any artificial markers. The non-cooperative target is assumed to be a 3D model known and constantly in the field of view of the camera mounted on the chaser. Space non-cooperative targets are regarded as less textured manmade objects, and the design documents of 3D model are available. Space appears to be black, so we can assume the object is in empty space and only the object is visible, and the background of the image is dark. Due to edge features offer a good invariance to illumination changes or image noise, our method relies on monocular vision and uses 3D-2D correspondences between the 3D model and its corresponding 2D edges in the image. The paper proposes to remove the sample points that are susceptible to false matches based on geometrical distance due to perspective projection of the 3D model. To allow a better robustness, we compare the local region similarity to get better matches between sample points and edge points. Our algorithm is proved to be efficient and shows improved accuracy without significant computational burden. The results show potential tracking performance with mean errors of < 3 degrees and < 1.5% of range.
Shape Matching under Affine Transformation (SMAT) is an important issue in shape analysis. Most of the existing SMAT methods are sensitive to noise or complicated because they usually need to extract the edge points or compute the high order function of the shape. To solve these problems, a new SMAT method which combines the low order shape normalization and the multi-scale area integral features is proposed. First, the shapes with affine transformation are normalized into their orthogonal representations according to the moments and an equivalent resample. This procedure transforms the shape by several linear operations: translations, scaling, and rotation, following by a resample operation. Second, the Multi-Scale Area Integral Features (MSAIF) of the shapes which are invariant to the orthogonal transformation (rotation and reflection transformation) are extracted. The MSAIF is a signature achieved through concatenating the area integral feature at a range of scales from fine to coarse. The area integral feature is an integration of the feature values, which are computed by convoluting the shape with an isotropic kernel and taking the complement, over the shape domain following by the normalization using the area of the shape. Finally, the matching of different shapes is performed according to the dissimilarity which is measured with the optimal transport distance. The performance of the proposed method is tested on the car dataset and the multi-view curve dataset. Experimental results show that the proposed method is efficient and robust, and can be used in many shape analysis works.
Reliable and stable visual perception systems are needed for humanoid robotic assistants to perform complex grasping and manipulation tasks. The recognition of the object and its precise 6D pose are required. This paper addresses the challenge of detecting and positioning a textureless known object, by estimating its complete 6D pose in cluttered scenes. A 3D perception system is proposed in this paper, which can robustly recognize CAD models in cluttered scenes for the purpose of grasping with a mobile manipulator. Our approach uses a powerful combination of two different camera technologies, Time-Of-Flight (TOF) and RGB, to segment the scene and extract objects. Combining the depth image and gray image to recognize instances of a 3D object in the world and estimate their 3D poses. The full pose estimation process is based on depth images segmentation and an efficient shape-based matching. At first, the depth image is used to separate the supporting plane of objects from the cluttered background. Thus, cluttered backgrounds are circumvented and the search space is extremely reduced. And a hierarchical model based on the geometry information of a priori CAD model of the object is generated in the offline stage. Then using the hierarchical model we perform a shape-based matching in 2D gray images. Finally, we validate the proposed method in a number of experiments. The results show that utilizing depth and gray images together can reach the demand of a time-critical application and reduce the error rate of object recognition significantly.
Human eyes cannot notice low contrast objects in the image. Image contrast enhancement methods can make the unnoticed objects noticed, and human can detect and recognize the objects. In order to guide the design of enhancement methods, performance of enhancement methods for objects detection and recognition(ODR) should be valued. The existing performance evaluation methods evaluate image enhancement methods by calculating the increment of contrast or image information entropy. However, it is essentially an image information transmission process that human detect and recognize objects in the image, and image contrast enhancement can be viewed as a form of image coding. According to human visual properties, the transmission process of ODR information are modeled in this paper, and a performance evaluation method was proposed from the information theory of Shannon.
Template matching is a significant approach in machine vision due to its effectiveness and robustness. However, most of the template matching methods are so time consuming that they can’t be used to many real time applications. The closed contour matching method is a popular kind of template matching methods. This paper presents a new closed contour template matching method which is suitable for two dimensional objects. Coarse-to-fine searching strategy is used to improve the matching efficiency and a partial computation elimination scheme is proposed to further speed up the searching process. The method consists of offline model construction and online matching. In the process of model construction, triples and distance image are obtained from the template image. A certain number of triples which are composed by three points are created from the contour information that is extracted from the template image. The rule to select the three points is that the template contour is divided equally into three parts by these points. The distance image is obtained here by distance transform. Each point on the distance image represents the nearest distance between current point and the points on the template contour. During the process of matching, triples of the searching image are created with the same rule as the triples of the model. Through the similarity that is invariant to rotation, translation and scaling between triangles, the triples corresponding to the triples of the model are found. Then we can obtain the initial RST (rotation, translation and scaling) parameters mapping the searching contour to the template contour. In order to speed up the searching process, the points on the searching contour are sampled to reduce the number of the triples. To verify the RST parameters, the searching contour is projected into the distance image, and the mean distance can be computed rapidly by simple operations of addition and multiplication. In the fine searching process, the initial RST parameters are discrete to obtain the final accurate pose of the object. Experimental results show that the proposed method is reasonable and efficient, and can be used in many real time applications.
KEYWORDS: 3D image processing, Reconstruction algorithms, 3D image reconstruction, 3D modeling, Laser imaging, Image processing, Clouds, 3D displays, Imaging systems, 3D acquisition
Research on three-dimensional (3D) surface reconstruction from range slices obtained from range-gated laser imaging system is of significance. 3D surfaces reconstructed based on existing binarization method or centroid method are rough or discontinuous in some circumstances. In this paper we address these problems and develop a 3D surface reconstruction algorithm based on the idea that combining the centroid method with weighted linear interpolation and mean filter. The algorithm consists of three steps. In the first step, interesting regions are extracted from each range slice based on mean filter, and then are merged to derive a single range image. In the second step, the derived range image is denoised and smoothed based on adaptive histogram method, weighted linear interpolation and mean filter method respectively. Finally, nonzero valued pixels in the after processed range image are converted to point cloud according to the range-gated imaging parameters, and then 3D surface meshes are established from the point cloud based on the topological relationship between adjacent pixels in the range image. Experiment is conducted on range slices generated from range-gated laser imaging simulation platform, and the registration result of the reconstructed surface of our method with the original surface of the object shows that the proposed method can reconstruct object surface accurately, so it can be used for the designing of reconstruction and displaying of range-gated laser imaging system, and also can be used for 3D object recognition.
KEYWORDS: 3D modeling, 3D image processing, Target recognition, Object recognition, 3D acquisition, Detection and tracking algorithms, Image processing, 3D imaging standards, Image filtering, Principal component analysis
Spin image has been applied to 3D object recognition system successfully because of its advantages of rotation,
translation and view invariant. However, this method is very time consuming, owning to its high-dimensional
characteristics and its complicated matching procedure. To reduce the recognition time, in this paper we propose a
coarse-to-fine matching strategy for spin images. There are two steps to follow. Firstly, a low dimensional feature is
introduced for a given point. The feature contain two components, its first component is the perpendicular distance from
the centroid of the given point’s neighbor region to the tangential plane of the given point, its second component is the
maximum distance between the projection point of the centroid on the tangential plane and projection points of the
neighbor region on the tangential plane. Secondly when comparing a point from a target with a point from a model, their
low features are matched first, only if they satisfy the low feature constrains, can they be selected as a candidate point
pair and their spin images are further matched by similarity measurement. When all the target points and all the model
points finish above matching process, those candidate point pairs with high spin image similarity are selected as
corresponding point pairs, and the target can be recognized as the model with the most amount of corresponding point
pairs. Experiment based on Stanford 3D models is conducted, and the comparison of experiment results of our method
with the standard spin image shows that the propose method is more efficient while still maintain the standard spin
image’s advantages.
The paper is committed in local image enhancement. At first, the authors propose an adjacent pixel gray order-preserving principle. Adjacent pixel gray order-preserving principle is the basement of local enhancement method which ensures that there is no distortion in processed image. And then, the authors propose an iterative algorithm, which could stretch gray-scale difference of adjacent pixels in premise of not changing gray magnitude relationship between adjacent pixels. At last, the authors propose a totally reference image quality assessment method based on adjacent pixel gray order-preserving principle. According to this quality assessment method, the authors made a set of comparative experiments with local histogram equalization and method. Experimental results show that the proposed enhancement method can get higher score and provide better visual effects, fully demonstrating its effectiveness. According to this quality assessment method, the proposed method shows a good effectiveness, through experimental results and comparison with local histogram equalization method. Local contrast enhancement, adjacent pixel gray order-preserving principle, iterative algorithm, image quality assessment.
Texture is of vital importance when rendering realistic infrared images. Traditional attempts to simulate infrared
textures often suffer from insufficient realism seen in actual infrared images. This paper presents a synthesis model for
generating infrared texture of aeolian sand ripples. By integration of physical radiation model and texture structure
model, sources of variability that causes IR texture are modeled. The physical radiation model takes into account
geometrical, thermodynamic and meteorological parameters and calculates the thermal radiation distribution of different
surface slopes. The texture structure model is introduced based on natural landscape features to simulate spatial
distribution patterns of aeolian sand ripples. Simulation results are indicative of good performance of the proposed
method to simulate radiometrically-correct yet visually-appealing infrared textures.
A catadioptric optical system is one that employs both lens and mirror components in the optical path, and can be used to capture a nearly 360° field of view. Such a system had the advantages of processing only one image, and with this image being continuous, not having to deal with discontinuities at the boundaries of view as a ring of conventional cameras would introduce. This paper presents a novel configured catadioptric sensor, a precise omnidirectional stereo vision optical device (OSVOD) based on a common perspective camera coupled with two hyperbolic mirrors. As the hyperbolic mirrors ensure a single viewpoint (SVP) the incident light rays are easily found from the points of the image. In our system the two hyperbolic mirrors, which aligned coaxially and separately, share one focus that coincides with the camera center. So the geometry of our system naturally ensures matched epipolar lines in the two images of the scene. The separation between the two mirrors provides a large baseline and eventually leads to a precise result. The properties mentioned above make the system especially suitable for omnidirectional stereo vision because depth estimation is simple, fast and precise. This proposed system can be used for detecting obstacles by mobile robots, automatic mapping of environments and machine vision where fast and real-time calculations are needed.
A kind of station keeping servo method based on texture and global characters analyzing for under water vehicle is described in this paper. Most of other systems with the same function adopt some artificial targets or other obvious characters such as corner, line, or outline for servoing. In some cases, the target, especially nature objects, haven't many obvious characters for identification. Texture and some region characters can be considered as the inherent features. The nature texture elements have their special relationships which will change according to the distance and the position changing between the camera and the texture elements. After some analyzing of texture, this paper gives an automatic texture region recognition and tracking algorism. A satisfying result of simulate controlling the four freedoms underwater vehicle with servoing is shown at the end of the paper.
KEYWORDS: Visual process modeling, Model-based design, 3D modeling, Visualization, Cameras, 3D metrology, Quantization, 3D vision, Mobile robots, Imaging systems
It is a hotspot problem in robotics research that determine the related pose between two space objects and realize their cooperate. 3D visual method is the main method for measures the space object pose. Because the rigidity demand during pose computes, the 3D vision at this condition ins model based. That is to say the model of the marks must be used as a strict restriction during the computing. This paper will give the method of computing 3D pose at first and discuss the model restriction problem emphatically, include model- based monocular vision method and model-based binocular stereo. The compare analysis between the two methods and the experiment result will be given at last.
To get itself global position is a major performance of an autonomous mobile robot. For this purpose, various kinds of location methods and sensors are being researched and developed. A global location sensor system based on vision technique was developed in our lab. The advantages of this sensor system are a larger view field that will no need to scan landmarks around, a faster sampling rate and a more precise position resolution that could satisfy mobile robot navigation. In this paper, the system architecture and locating method are presented. Some key techniques in location processing that include image processing, landmarks optimal selecting, and increase the system precision and speed are discussed emphatically; and the experimental results are given at the last.
`Rainbow Range Finder' is a method for rapidly acquiring 3D information based on spectral analysis. It uses a special light with continuous spectrum to project across the objects, and its image will present the regular change of colors. One color forms a line in the color image and responds to a light plane structured by a wave band in the spectrum. When all the light planes are calibrated and the camera model is known, we can calculated 3D coordinates of all image points in the scene. This paper mainly discusses the light plane calibration and color classification techniques for implementation of these methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.