The ability to detect and match features across multiple views of a scene is a crucial first step in many computer vision
algorithms for dynamic scene analysis. State-of-the-art methods such as SIFT and SURF perform successfully when
applied to typical images taken by a digital camera or camcorder. However, these methods often fail to generate an
acceptable number of features when applied to medical images, because such images usually contain large homogeneous
regions with little color and intensity variation. As a result, tasks like image registration and 3D structure recovery
become difficult or impossible in the medical domain.
This paper presents a scale, rotation and color/illumination invariant feature detector and descriptor for medical
applications. The method incorporates elements of SIFT and SURF while optimizing their performance on medical data.
Based on experiments with various types of medical images, we combined, adjusted, and built on methods and
parameter settings employed in both algorithms. An approximate Hessian based detector is used to locate scale invariant
keypoints and a dominant orientation is assigned to each keypoint using a gradient orientation histogram, providing
rotation invariance. Finally, keypoints are described with an orientation-normalized distribution of gradient responses at
the assigned scale, and the feature vector is normalized for contrast invariance. Experiments show that the algorithm
detects and matches far more features than SIFT and SURF on medical images, with similar error levels.
KEYWORDS: 3D modeling, Cameras, Video, 3D image processing, Computer simulations, Visual process modeling, Process modeling, Image registration, Data modeling, Motion models
3D computer models of body anatomy can have many uses in medical research and clinical practices. This paper
describes a robust method that uses videos of body anatomy to construct multiple, partial 3D structures and
then fuse them to form a larger, more complete computer model using the structure-from-motion framework.
We employ the Double Dog-Leg (DDL) method, a trust-region based nonlinear optimization method, to jointly
optimize the camera motion parameters (rotation and translation) and determine a global scale that all partial
3D structures should agree upon. These optimized motion parameters are used for constructing local structures,
and the global scale is essential for multi-view registration after all these partial structures are built. In order
to provide a good initial guess of the camera movement parameters and outlier free 2D point correspondences
for DDL, we also propose a two-stage scheme where multi-RANSAC with a normalized eight-point algorithm
is first performed and then a few iterations of an over-determined five-point algorithm is used to polish the
results. Our experimental results using colonoscopy video show that the proposed scheme always produces more
accurate outputs than the standard RANSAC scheme. Furthermore, since we have obtained many reliable point
correspondences, time-consuming and error-prone registration methods like the iterative closest points (ICP)
based algorithms can be replaced by a simple rigid-body transformation solver when merging partial structures
into a larger model.
KEYWORDS: 3D modeling, Cameras, 3D image processing, Video, Colon, Solid modeling, Visual process modeling, Motion models, Data modeling, Computing systems
A 3D colon model is an essential component of a computer-aided diagnosis (CAD) system in colonoscopy to
assist surgeons in visualization, and surgical planning and training. This research is thus aimed at developing
the ability to construct a 3D colon model from endoscopic videos (or images). This paper summarizes our ongoing
research in automated model building in colonoscopy. We have developed the mathematical formulations
and algorithms for modeling static, localized 3D anatomic structures within a colon that can be rendered from
multiple novel view points for close scrutiny and precise dimensioning. This ability is useful for the scenario
when a surgeon notices some abnormal tissue growth and wants a close inspection and precise dimensioning. Our
modeling system uses only video images and follows a well-established computer-vision paradigm for image-based
modeling. We extract prominent features from images and establish their correspondences across multiple images
by continuous tracking and discrete matching. We then use these feature correspondences to infer the camera's
movement. The camera motion parameters allow us to rectify images into a standard stereo configuration and
calculate pixel movements (disparity) in these images. The inferred disparity is then used to recover 3D surface
depth. The inferred 3D depth, together with texture information recorded in images, allow us to construct a 3D
model with both structure and appearance information that can be rendered from multiple novel view points.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.