Sparse-view computed tomography (CT) becomes a major concern in the medical imaging field due to its reduced X-ray radiation dose. Recently, various convolutional neural network (CNN)-based approaches have been proposed, requiring the pairs of full and sparse-view CT images for network training. However, these paired data acquisition is impractical or difficult in clinical practice. To handle this problem, we propose the weakly-supervised learning for streak artifact reduction with unpaired sparse-view CT data. For CNN training dataset, we generate the pairs of input and target images from the given sparse-view CT data. Then, we iteratively apply the trained network to given sparse-view CT images and acquire the prior images. As the success factor of our novel framework, we estimate the original streak artifacts in the given sparse-view CT images from the prior images and subtract the estimated streak artifacts from the given sparse-view CT images. As a result, the proposed method has the best performance of lesion detection compared to the other methods.
In our previous work, we proposed to reduce motion artifacts in computed tomography (CT) using an image-based convolutional neural network (CNN). However, its motion compensation performance was limited when the degree of motion was large. We note that a fast scan mode can reduce the degree of motion but also cause streak artifacts due to sparse view sampling. In this study, we aim to initially reduce motion artifacts using a fast scan mode, and to reduce both the streak and motion artifacts using a CNN-based two-phase approach. In the first phase, we focus on reducing streak artifacts caused by sparse projection views. To effectively reduce streak artifacts in the presence of motion artifacts, a CNN with the U-net architecture and residual learning scheme was used. In the second phase, we focus on compensating motion artifacts in output image of the first phase. For this task, the attention blocks with global average pooling were used. To generate datasets, we used extended cardiac-torso phantoms and simulated sparse-view CT using half, quarter, and one-eighth of full projection views with corresponding 6-degree of freedom rigid motions. The results showed that the proposed two-phase approach effectively reduced both the motion and streak artifacts and taking fewer projection views down to one-eighth views (thus improving a scanning speed) provided the better image quality in our simulation study.
Sparse-view computed tomography (CT) has been attracting attention for its reduced radiation dose and scanning time. However, analytical image reconstruction methods such as filtered back-projection (FBP) suffer from streak artifacts due to sparse-view sampling. Because the streak artifacts are deterministic errors, we argue that the same artifacts can be reasonably estimated using a prior image (i.e., smooth image of the same patient) with known imaging system parameters. Based on this idea, we reconstruct an FBP image with sparse-view projection data, regenerate the streak artifacts by forward and back-projection of a prior image with sparse views, and then subtract them from the original FBP image. For the success of this approach, the prior image needs to be patient specific and easily obtained from given sparse-view projection data. Therefore, we introduce a new concept of implicit neural representations for modeling attenuation coefficients. In the implicit neural representations, neural networks output a patient-specific attenuation coefficient value for an input pixel coordinate. In this way, network’s parameters serve as an implicit representation of a CT image. Unlike conventional deep learning approaches that utilize a large, labeled dataset, an implicit neural representation is optimized using only sparse-view projection data of a single patient. This avoids having a bias toward a group of patients in the dataset and helps to capture unique characteristics of the individual properly. We validated the proposed method using fan-beam CT simulation data of an extended cardiac-torso phantom and compared the results with total variation-based iterative reconstruction and an image-based convolutional neural network.
In this study, we propose a method to reduce streak artifacts of sparse view CT images via convolutional neural network (CNN). The main idea of the proposed method is to utilize both image and sinogram domain data for CNN training. To generate datasets, projection data were acquired from 512 (128) views using Siddon’s ray-driven algorithm, and full (sparse) view CT images were reconstructed by filtered back projection with a Ram-Lak filter. We first trained U-net based CNN_img, which was designed to reduce the streak artifacts of sparse view CT in image domain. Then, the output images of CNN_img were used as prior images to conduct pseudo full view sinogram. Before upsampling, sparse view sinogram was normalized by the prior images, and then linear interpolation was employed to estimate the missing view data compared to full view sinogram. The upsampled data were denormalized using prior images. To reduce the residual errors in pseudo full view sinogram data, we trained CNN_hybrid with residual encoder-decoder CNN, which is known to be effective in reducing the residual errors while preserving structural details. In order to increase the learning efficiency, the dynamic range of the pseudo full view sinogram data was converted via exponential function. The results show that the CNN_hybrid provides better performance in streak artifacts reduction than CNN_img, which is also confirmed by quantitative assessment.
Convolutional neural network (CNN)-based CT denoising methods have attracted great interest for improving the image quality of low-dose CT (LDCT) images. However, CNN requires a large amount of paired data consisting of normal-dose CT (NDCT) and LDCT images, which are generally not available. In this work, we aim to synthesize paired data from NDCT images with an accurate image domain noise insertion technique and investigate its effect on the denoising performance of CNN. Fan-beam CT images were reconstructed using extended cardiac-torso phantoms with Poisson noise added to projection data to simulate NDCT and LDCT. We estimated local noise power spectra and a variance map from a NDCT image using information on photon statistics and reconstruction parameters. We then synthesized image domain noise by filtering and scaling white Gaussian noise using the local noise power spectrum and variance map, respectively. The CNN architecture was U-net, and the loss function was a weighted summation of mean squared error, perceptual loss, and adversarial loss. CNN was trained with NDCT and LDCT (CNN-Ideal) or NDCT and synthesized LDCT (CNN-Proposed). To evaluate denoising performance, we measured the root mean squared error (RMSE), structural similarity index (SSIM), noise power spectrum (NPS), and modulation transfer function (MTF). The MTF was estimated from the edge spread function of a circular object with 12 mm diameter and 60 HU contrast. Denoising results from CNN-Ideal and CNN-Proposed show no significant difference in all metrics while providing high scores in RMSE and SSIM compared to NDCT and similar NPS shapes to that of NDCT.
Signal-known-statistically (SKS) detection task is more relevant to the clinical tasks compared to signal-knownexactly (SKE) detection task. However, anthropomorphic model observers for SKS tasks have not been studied as much as those for SKE tasks. In this study, we compare the ability of conventional model observers (i.e., channelized Hotelling observer and nonprewhitening observer with an eye-filter) and convolutional neural network (CNN) to predict human observer performance on SKS and background-known-statistically tasks in breast cone beam CT images. For model observers, we implement 1) the model which combines the responses of each signal template and 2) two-layer CNN. We implement two-layer CNN in linear and nonlinear schemes. Nonlinear CNN contains max pooling layer and nonlinear activation function which are not contained in linear CNN. Both linear and nonlinear CNN based model observers predict the rank of human observer performance for different noise structures better than conventional model observers.
We proposed a convolutional neural network (CNN)-based anthropomorphic model observer to predict human observer detection performance for breast cone-beam CT images. We generated the breast background with a 50% volume glandular fraction and inserted 2mm diameter spherical signal near the center. Projection data were acquired using a forward projection algorithm and were reconstructed using the Feldkamp-Davis-Kress reconstruction. To generate different noise structures, the projection data were filtered with Hanning, SheppLogan, and Lam-Lak filters with and without Fourier interpolation, resulting in six different noise structures. To investigate the benefits of non-linearity in CNN, we used the two different network architectures: linear CNN (Li-CNN) without any activation function and multi-layer CNN (ML-CNN) with a leaky rectified linear unit. For comparison, we also used a nonprewhitening observer with an eye-filter (NPWE) having the peak value at the frequency of 7 cyc/deg based on our previous work. We trained CNN to minimize the mean squared error using 12,000 pairs of signal-present and signal-absent images which were labeled with decision variable from NPWE. When labeling, the eye filter parameter of NPWE was fine-tuned separately for each noise structure to match percent correct to that of human observers. Note that we trained a single network for different noise structures whereas the template of NPWE was estimated for each noise structure. We conducted four alternative forced choice for detection tasks, and percent correct of human and model observers were compared. The results show that the proposed ML-CNN better predicts detection performance of human observers than NPWE and Li-CNN.
Recently, the necessity of using low-dose CT imaging with reduced noise has come to the forefront due to the risks involved in radiation. In order to acquire a high-resolution image from a low-resolution image which produces a relatively small amount of radiation, various algorithms including deep learning-based methods have been proposed. However, the current techniques have shown limited performance, especially with regard to losing fine details and blurring high-frequency edges. To enhance the previously suggested 2D patch-based denoising model, we have suggested the 3D block-based REDCNN model, employing convolution layers paired with deconvolution layers, shortcuts, and residual mappings. This process allows us to preserve the image structure and diagnostic features of an image, increasing image resolution by smoothing noise. Finally, we applied a bilateral filter in 3D and utilized a 2D-based Landweber iteration method to reduce remaining noise under a certain amplitude and prevent the edges from blurring. As a result, our proposed method effectively reduced Poisson noise level without losing diagnostic features and showed high performance in both qualitative and quantitative evaluation methods compared to ResNet2D, ResNet3D, REDCNN2D, and REDCNN3D.
In FDK reconstruction, distribution of noise power is different along the axial (i.e., high pass noise) and coronal slice (i.e., low pass or white noise), which may results in different detectability of same objects. In this work, we examined denoising performance of convolutional neural network trained using axial and coronal slice images separately, and how the direction of image slice affects the detectability of small objects in denoised images. We used the modified version of U-Net. For network training, we used Adam optimizer with a learning rate of 0.001, batch size of 4, and VGG loss was used. The XCAT simulator was used to generate the training, validation, and test dataset. Projection data was acquired by Siddon’s method for the XCAT phantoms, and different levels of Poisson noise was added to the projection data to generate quarter dose and normal dose CT images, which were then reconstructed by FDK algorithm. The reconstructed quarter dose and normal dose CT images were used as training, validation, and test dataset for our network. The performance of denoised output images from U-Net-Axial (i.e., network trained using axial images) and U-Net-Coronal (i.e., network trained using coronal images) were evaluated using structural similarity (SSIM) and mean square error (MSE). The results showed that output images from both U-Net-Axial and U-Net-Coronal shows the improved image quality compared to quarter dose images. However, it was observed that the detectability of small objects were higher in U-Net-Coronal.
Convolutional neural network (CNN) is now the most promising denoising methods for low-dose computed tomography (CT) images. The goal of denoising is to restore original details as well as to reduce noise, and the performance is largely determined by the loss function of the CNN. In this work, we investigate the denoising performance of CNN for three different loss functions in low dose CT images: mean squared error (MSE), perception loss using the pretrained VGG network (VGG loss), and the weighted summation of MSE and VGG losses (VGGMSE loss). CNNs are trained to map the quarter dose CT images to normal dose CT images in a supervised fashion. The image quality of denoised images is evaluated by normalized root mean squared error (NRMSE), structural similarity index (SSIM), mean and standard deviation (SD) of HU values, and the task SNR of non-prewhitening eye filter observer model (NPWE). Our results show that the CNN trained with MSE loss achieves the best performance in NRMSE and SSIM despite significant image blurs. On the other hand, the CNN trained with VGG loss reports the best score in the SD with well-preserved details but has the worst accuracy in the mean HU value. CNN trained with VGGMSE loss shows the best performance in terms of tSNR and the mean HU value and consistently high performance in other metrics. In conclusion, VGGMSE loss can subside the drawbacks of MSE or VGG loss, thus much more effective than them for CT denoising tasks.
In recent years, CNN has been gaining attention as a powerful denoising tool after the pioneering work [7], developing 3-layer convolutional neural network (CNN). However, the 3-layer CNN may lose details or contrast after denoising due to its shallow depth. In this study, we propose a deeper, 7-layer CNN for denoising low-dose CT images. We introduced dimension shrinkage and expansion steps to control explosion of the number of parameters, and also applied the batch normalization to alleviate difficulty in optimization. The network was trained and tested with Shepp-Logan phantom images reconstructed by FBP algorithm from projection data generated in a fan-beam geometry. For a training set and a test set, the independently generated uniform noise with different noise levels was added to the projection data. The image quality improvement was evaluated both qualitatively and quantitatively, and the results show that the proposed CNN effectively reduces the noise without resolution loss compared to BM3D and the 3-layer CNN.
A finite focal spot size is one of the sources to degrade the resolution performance in a fan beam CT system. In this work,
we investigated the effect of the finite focal spot size on signal detectability. For the evaluation, five spherical objects with
diameters of 1 mm, 2 mm, 3 mm, 4 mm, and 5 mm were used. The optical focal spot size viewed at the iso-center was a 1
mm (height) × 1 mm (width) with a target angle of 7 degrees, corresponding to an 8.21 mm (i.e., 1 mm / sin (7°)) focal
spot length. Simulated projection data were acquired using 8 × 8 source lets, and reconstructed by Hanning weighted
filtered backprojection. For each spherical object, the detectability was calculated at (0 mm, 0 mm) and (0 mm, 200 mm)
using two image quality metrics: pixel signal to noise ratio (SNR) and detection SNR. For all signal sizes, the pixel SNR
is higher at the iso-center since the noise variance at the off-center is much higher than that at the iso-center due to the
backprojection weightings used in direct fan beam reconstruction. In contrast, detection SNR shows similar values for
different spherical objects except 1 mm and 2 mm diameter spherical objects. Overall, the results indicate the resolution
loss caused by the finite focal spot size degrades the detection performance, especially for small objects with less than 2
mm diameter.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.