1.IntroductionThe development of image denoising methods for medical imaging applications based on deep neural networks (DNNs) remains an active area of research.1–7 Although learning-based image denoising methods, by conventional design, can improve traditional image quality (IQ) measures such as root mean square error (RMSE) and structural similarity index measure (SSIM), it is well known that such measures may not always correlate with objective task-based IQ measures.8–11 Here and throughout this article, a “task” denotes an image-based inference to be performed by a human or numerical observer. This is because the loss functions that are commonly employed to train such methods do not explicitly take into account the intended task that is to be performed by use of the resulting images. For example, Yu et al.10 demonstrated that task-based metrics were not consistent with traditional IQ metrics in a study of DNN-based image denoising related to nuclear medicine imaging. Likewise, Li et al.9 reported similar findings and systematically investigated task-related information loss induced by DNN-based denoising methods under different conditions. Task-information loss has also been studied within the context of the learning-based single-image super-resolution problem.12 To enhance the utility of an image produced by the use of a learning-based method, information regarding a task can be naturally incorporated into the training procedure.13–16 A variety of task-informed methods employ a hybrid loss comprised of a conventional component and a task-based loss component. For an image reconstruction problem, Adler et al.13 proposed such an approach to establish a learned reconstruction operator. Similarly, Ongie et al.14 designed a low-dose computed tomography (CT) reconstruction framework to enhance the detectability of signals. For enhancing the utility of denoised images for segmentation tasks, Zhang et al.17 proposed a task-informed low-dose CT denoising framework that employed a hybrid loss that incorporated the dice score loss. In a different approach that did not employ the hybrid loss strategy, Han et al.15 proposed a perceptual loss-based denoising method. Although these studies provide valuable insights into the potential of learning-based task-informed image formation and restoration methods, this line of research is relatively new and underdeveloped. Improved utility of the estimated image for the specified task generally comes at the cost of degraded task-agnostic measures of image quality, and understanding this complicated trade-off within the context of a specific problem is important. Although task-related information has been incorporated into loss function designs, the use of transfer learning coupled with constraints on how such information is utilized during model fine-tuning remains unexplored. This is potentially important because previous studies have reported that task-related information may be primarily lost by the deeper layers of a DNN for certain applications.9 Another critical issue that is specifically relevant to task-informed learning-based methods for image formation or restoration relates to generalization performance with respect to the task. Tasks in medical imaging applications are generally complicated and can be difficult to comprehensively specify, either analytically or implicitly via the specification of a collection of acquired images. For example, a signal detection task requires the specification of the signal, the background in which it is embedded, and the measurement noise to be detected. All of these quantities are stochastic in nature, and the former two will vary with the subject and disease state in the specified cohort. When a task-informed image formation or restoration method is trained with consideration of a specified detection task, it is anticipated, by design, that the resulting images will possess enhanced utility for performing that particular task. However, at inference time, the characteristics of the signal, background, or noise may differ from those modeled in the original task. This is a phenomenon that we refer to as “task-shift,” indicating that source tasks (used for training) are different from target tasks (used for inference).18 Assessing the robustness of a task-informed image formation or restoration method to task-shift is essential to understanding its potential suitability for clinical translation. In this work, numerical studies are performed to yield insights into fundamental issues related to the incorporation of signal detection task information into a learned image denoising method. Consider that medical images are denoised by use of a DNN and the clinical task of interest is to detect a signal in the denoised images. The following two questions motivate the study design: (1) How do traditional and task-based measures of IQ covary when a conventionally trained DNN is fine-tuned by use of a hybrid task-informed loss function with all weights being frozen except for select deep layers? and (2) What is the impact of task-shift on the IQ measures and what is the relative influence of the source and target task complexity? A virtual imaging test bed is employed to enable a systematic exploration of these questions. The test bed comprises a stylized computational model of a chest X-ray computed tomography (CT) imaging system coupled with high-fidelity clinical CT images that represent the to-be-imaged objects. From simulated noisy projection data, images that contain lesions are reconstructed. These images are subsequently denoised by use of a learned method. Although a vast number of DNN-based image denoising methods are available and new ones are being developed at a breakneck pace, in this work, a canonical, fully supervised, convolutional neural network (CNN)-based denoising method is purposely adopted. This will facilitate a basic analysis and understanding of the underlying issues that may be relevant to a variety of applications and more advanced denoising or image reconstruction methods. Signal detection and signal detection-localization tasks are considered, and several distinct types of numerical observers are employed to compute estimates of the task performance. The studies will reveal how a task-informed transfer learning approach can influence the tradeoff between conventional and task-based measures of image quality within the context of the considered tasks. In addition, for the first time, insights into the behavior of a learned denoising method when task-shift is present are revealed. The remainder of the paper is organized as follows. Section 2 describes the necessary background on DNN-based image denoising, signal detection tasks, and numerical observers. The task-informed training method considered in this work is described in Sec. 3. The numerical studies and associated results are described in Secs. 4 and 5, respectively. Finally, the article concludes with a discussion of the key findings in Sec. 6. 2.Background2.1.Learning-Based Image DenoisingEnd-to-end learning-based denoising methods hold significant potential for medical imaging applications.1,4,7,19–21 Given a noisy image , where is the dimension of the image, an end-to-end learning-based denoising method is described generically as where denotes an image-to-image mapping implemented by a DNN that is parameterized by the weight vector and denotes the denoised image. Depending on how the target data are defined when training the DNN, can be interpreted as an estimate of the noiseless image or an estimate of a reduced noise version of . A variety of DNNs have been employed to implement the mapping ,1,21 and convolutional neural networks (CNNs) represent a popular choice.1–3,6,19In addition to the choice of DNN architecture, the specification of the loss function plays a key role in the design of a DNN-based denoising method. Mean square error (MSE) that measures the distance between the denoised and target images has been widely employed.3,4,6,19–23 The perceptual loss function has also been used and was reported to be effective in reducing noise while retaining image details,1 and the use of adversarial loss functions has been deployed with similar success.20,24 However, such loss functions that are commonly employed in computer vision applications do not explicitly incorporate information regarding a particular medical imaging task. In recent studies, it has been demonstrated that learning-based denoising methods trained by the use of such loss functions can improve traditional IQ measures such as RMSE or SSIM, whereas important information relevant to a downstream detection task is lost.9,10 Such findings motivate the further development and investigation of task-informed learning-based denoising methods. 2.2.Formulation of Binary Signal Detection TaskIn this study, a binary signal detection task that requires an observer to classify a denoised image as satisfying either a signal-present hypothesis or a signal-absent hypothesis is considered. These two hypotheses are described as where and denote signal-present and signal-absent noiseless images, respectively, and denotes the measurement noise. A signal-present image is formulated by inserting a signal image into a background object . In a signal-known-exactly (SKE) detection task, is non-random, whereas, in a signal-known-statistically (SKS) detection task, it is a random process. Similarly, in a background-known-exactly (BKE) detection task, is nonrandom, whereas, in a background-known-statistically (BKS) detection task, it is a random process.In addition, detection-localization tasks in which the signal could be located at one of distinct locations were considered.25 In this case, an observer is required to classify an image as satisfying one of hypotheses (i.e., one signal-absent hypothesis and signal-present hypotheses). The imaging processes under these hypotheses are represented as where and is a signal-present noiseless image with the signal at the ’th location.2.3.Numerical Observers for Objective IQ AssessmentIn preliminary assessments of medical imaging technologies, numerical observers (NOs) have been employed to quantify task-based measures of IQ for various image-based inferences.26 The NOs employed in this study are surveyed below. 2.3.1.Hotelling observerThe Hotelling observer (HO) employs the Hotelling discriminant, which is the population equivalent of the Fisher linear discriminant, and is optimal among all linear observers in the sense that it maximizes the signal-to-noise ratio of the test statistic.8,27 For binary signal detection tasks, the HO test statistic computed by the use of the denoised data is defined as where denotes the Hotelling template, denotes the difference between the ensemble mean of the image data under the two hypotheses and , and . Here, and denote the covariance matrices corresponding to under and , respectively.In some cases, the covariance matrices and are ill-conditioned, and therefore, their inverse cannot be stably computed. To address this, a regularized HO (RHO) can be employed that implements the test statistic as9 where represents a low-rank approximation of that is formed by keeping only the singular values of greater than . Here, is a tunable parameter, and represents the largest singular value of . Finally, is the Moore–Penrose inverse of .2.3.2.Channelized Hotelling observerA channelized HO (CHO) is formed when the HO is employed with a channeling mechanism. When implemented with anthropomorphic channels and an internal noise mechanism, the CHO can be interpreted as an anthropomorphic observer and attempts to predict the human observer performance.28,29 In addition, the channeling mechanism can be employed to reduce the dimensionality of the image data when the image data are insufficient to accurately estimate the covariance matrix. Let denote a channel matrix and denote the corresponding channelized image data. The CHO test statistic is given as where denotes the covariance matrix of the channelized data , denotes the covariance matrix of the channel internal noise, and is a noise vector sampled from a Gaussian distribution . Based on previous studies,29 in this work, is defined as where represents a diagonal matrix with diagonal elements from and is the internal noise level. The parameters of the difference-of-Gaussian (DOG) channels and the internal noise level employed in this study are described in Sec. 4.3.4.2.3.3.Learned NOsRecently, several machine learning methods have been proposed to establish NOs.30–36 The single-layer neural network (SLNN)-based NO (SLNN-NO) is a special learned NO that has the shallowest architecture, possessing only a single fully connected layer with a bias term and a sigmoid activation function. This architecture can be employed for different tasks through the specification of the loss function. For example, the binary cross entropy (BCE) can be used to train the SLNN-NO for binary signal detection tasks, whereas the categorical cross-entropy loss function can be employed for detection-localization tasks. A SLNN-based method has also been proposed to approximate the HO.30 This NO will be referred to as the SLNN-HO and will also be employed in the studies below. The SLNN-HO is useful when the estimation and/or inversion of the image covariance matrix is intractable. 3.Task-Informed Training MethodA transfer learning approach is investigated in which a DNN is pre-trained by the use of a conventional (non-task-informed) loss function and subsequently fine-tuned by the use of a hybrid loss that includes a task component . The fine-tuning of the denoising network is constrained to the last several layers instead of re-training the whole network. This is motivated by a recent study by Li et al.9 who demonstrated that, at least for linear CNN-based denoising networks, the degradation of task-relevant information primarily occurs in the last layers. This behavior can be explained by noting that the last layer of the denoising network transforms a high-dimensional feature tensor into the denoised output image. Therefore, the transform possesses a null space and is non-invertible.9 A hybrid loss function is defined as13 where is a scalar parameter, is the physical loss component, is the task component, is the vector of weight parameters associated with the trainable layers in the pretrained denoising network, and denotes the vector of weight parameters of the neural network (NN)-based NO used to compute the task component . The task component is designed to measure the performance of a NO on a specific task. By appending a network-based NO to the pretrained denoising network, the denoised image can be transformed into a scalar that is used to compute the task-specific component. The trainable layers in the pretrained denoising network are jointly trained with the NO. By employing this training strategy, the NO used to compute can be easily adapted to different tasks. The details of the proposed task-informed training method are described below and summarized in Procedure 1.Procedure 1General procedure of the task-informed training method
Mean squared error (MSE) and mean absolute error are commonly employed choices for . The selection of the task-based loss component is based on specific tasks. In this paper, binary signal detection tasks were considered, and the specific formulation of is described in Sec. 4.2. 4.Numerical StudiesComputer-simulation studies using a stylized X-ray CT virtual imaging test bed were conducted to gain insights into fundamental issues described in Sec. 1. Signal-known-statistically (SKS) with background-known-statistically (BKS) signal detection and signal detection-localization tasks were considered. Both the SLNN-NO and SLNN-HO were employed to compute estimates of the task performance, which was employed to evaluate the impact of a task-informed training procedure on the considered denoising network described in Procedure 1. 4.1.Virtual Imaging PipelineThe Lung Image Database Consortium image collection37 was employed to generate signal-present (SP) images and signal-absent (SA) images to perform binary signal detection tasks as defined in Eq. (2). This database consists of 243,945 2D image slices from 1018 3D thoracic CT reconstructed images, in which 10,706 image slices contain annotated nodules. A total of 100,000 SA images were formed by extracting regions of interest (ROIs) of a dimension of from normal lung areas from several central slices. To generate SP images, an established insertion method38 was employed to insert realistic nodules into 50,000 generated SA images. In SP images, the centroids of the nodules were either located at a fixed location or at random locations subject to the specific tasks described in Sec. 4.3. The generated SP and SA images were utilized as the target (normal-dose) CT images . The corresponding noise-enhanced (low-dose) images were generated by degrading the target images described above. A canonical fan-beam CT imager with a linear detector geometry was considered for noise simulation. To produce these images, the true continuous-to-discrete forward operator was approximated by a discrete-to-discrete operator that was implemented by use of the Radon-torch toolbox.39 The scanning angular range of the modeled fan-beam system was 360 degrees, and 256 evenly spaced tomographic views were considered. The assumed distance between the X-ray source and the center of the object and the distance between the detector and the center of the object were 400 and 400 mm, respectively. The number of detector elements was 512, and each element was 0.8 mm in size. During the simulated imaging process, the forward operator was applied to the entire chest cross-section, covering the system’s full field of view. Noise-enhanced projection data were generated as8,39 where is a Poisson noise generator acting on the transformed measurement data . Here, , and , where is the beam intensity. The noisy (low-dose) images were then reconstructed from by the use of a filtered back-projection reconstruction algorithm that employed a Ram-Lak filter.40 As described below, the proposed denoising method was applied to regions of interests (ROIs) of dimension within the reconstructed images. These ROIs were situated within the lung area, with their center locations uniformly distributed over that region. Figure 1 shows examples of ROIs employed in our studies.4.2.Training and Validation Details4.2.1.Architecture and loss function for denoising networksThe canonical CNN architecture of depth depicted in Fig. 2 was employed with the task-informed training method to establish an end-to-end learned denoising method. It is important to note that the assessment studies described below can be readily repeated with any other DNNs. The network input was a reconstructed noisy image of dimension , and the output was a denoised image with the same dimensions. The CNN contained four types of layers. The first layer was a Conv+ReLU layer, in which 64 convolution filters of dimension were applied to generate 64 feature maps. In each of the 2nd to Conv + BN + ReLU layers, 64 convolution filters of dimension were employed, and batch normalization was included between the convolution and ReLU operations. In the Conv + BN layer, 64 convolution filters of dimension were employed, and batch normalization was performed. In the last Conv layer, one single convolution filter of dimension was employed to form the final denoised image of dimension . Let denote a given SA or SP target (normal-dose) image, and let denote the corresponding noise-enhanced (low-dose) image. Given a collection of paired training data , the denoising network was pretrained by minimizing the MSE loss function: where the vector denotes the weight parameters of the denoising network.4.2.2.Architecture and loss function for the NN-based observers used to computeThe physical loss function in Eq. (8) was defined by an MSE loss. The task component of the hybrid loss function was computed by the use of either the SLNN-NO or SLNN-HO, as described next. The SLNN-NO consisted of a fully connected layer along with a sigmoid activation function. The BCE loss function was employed to train the SLNN-NO. Let denote the image data and the corresponding label . The BCE loss function is expressed as30 Here, is the vector of weight parameters associated with the trainable layers in the pretrained denoising network, and the vector denotes the weight parameters of the fully connected layer of the appended SLNN-NO. Differently, the SLNN-HO loss function is expressed as30 where , , and .4.2.3.Datasets and denoising network training detailsThe standard convention of utilizing separate training/validation/testing datasets was adopted. The training dataset included 40,000 pairs of noisy signal-present and signal-absent images along with the corresponding target (normal noise) images. The validation dataset including 200 signal-present images and 200 signal-absent images, and the corresponding target (normal noise) images was randomly selected from the training dataset. Finally, the testing dataset comprised 10,000 signal-present images and 10,000 signal-absent noisy images. For task-informed model training with the hybrid loss function , the same training dataset described above was employed to fine-tune the denoising network. The validation and testing datasets used for pretraining were also employed to evaluate the performance of the fine-tuned denoising networks. In both the pretraining and task-informed fine-tuning stages, the denoising networks were trained on mini-batches at each iteration by the use of the Adam optimizer41 with a learning rate of 0.0001. Each mini-batch contained 50 signal-present images and 50 signal-absent images that were randomly selected from the training dataset. The network model that possessed the best performance on the validation dataset was selected for use. The Keras library42 was employed for implementing and training all networks on a single NVIDIA TITAN X GPU. 4.3.Objective Evaluation of Image Quality4.3.1.SKS/BKS binary signal detection tasks with fixed signal locationsThe task-informed training method was evaluated for SKS/BKS binary signal detection tasks for which known signal locations were considered. The centroids of the nodules were located at the center of the extracted ROIs. The incident flux was used to determine the noise level in the simulated noisy images. When fine-tuning the denoising networks, the SLNN-NO and SLNN-HO were employed as NOs to compute the task component in Eq. (8). Here, was defined as the BCE and HO loss functions in Sec. 4.2.2 when SLNN-NO and SLNN-HO were employed, respectively. The SLNN-NO, HO, RHO, and DOG-CHO were employed for subsequent assessments of image quality. It should be noted that, for the case in which the SLNN-NO was employed to compute , the SLNN-NO employed for objective image quality assessment was trained on the denoised estimates, and it was not identical to that used to compute . The use of these NOs for evaluation represented a situation in which the NO for evaluation may not be identical to the NO used to optimize the denoising networks. The weight parameters in Eq. (8) were considered. Only the last three convolutional layers of the denoising network were set to be trainable for both cases. Based on these settings, the impact of the weight parameter on the performance of the considered NOs was investigated. 4.3.2.SKS/BKS binary signal detection tasks with random signal locationsIn this case, the centroids of the nodules were randomly located within the lung area of extracted ROIs by the use of a uniform probability density function. The incident flux was used to determine the noise level of the simulated low-dose images. The SLNN-NO was used to compute task component in Eq. (8), considering that the SLNN-NO can be employed when the signal is randomly located. The trained SLNN-NO was subsequently utilized to evaluate the performance of fine-tuned denoising networks. This represented a situation in which the same observer was used for both training and evaluation. To assess the impact of the weight parameter on the performance of the SLNN-NO, the weight parameters in Eq. (8) were considered. The number of trainable layers was also swept from 0 to 4. 4.3.3.Investigation of the impact of task-shiftsTest cases with different weight parametersA study was designed to investigate the robustness of the task-informed image denoising method to task-shift. First, binary signal detection tasks with fixed signal locations were considered for model training (source tasks), whereas tasks with random signal locations were considered for evaluation (target tasks). Next, the tasks with random signal locations were used as source tasks, and the tasks with fixed locations were considered target tasks. Detection-localization tasks were also considered. The tasks with two and four possible signal locations were considered to be both source/target and target/source tasks to study the impact of task-shift. The considered test cases are outlined in Table 1. Table 1Test cases designed for the investigation of the impact of task-shifts described in Sec. 4.3.3. Here, BSD and D&L represent binary signal detection tasks and detection-localization tasks, respectively.
The SLNN-NO was employed to compute task component in Eq. (8), and only the last three convolutional layers were set to be trainable. For evaluations, SLNN-NOs were independently trained on training datasets for target tasks. The SLNN-NO performance under the situations without task-shift was considered the reference. The weight parameters in Eq. (8) were considered to investigate the impact of task-shifts when the weight of the task-based component varies. Test cases with gradually increased mismatches in source/target tasksStudies were designed to simulate situations in which the mismatch in the source task and target task was gradually increased. In the case of a binary signal detection task, the source task was a binary signal detection task with a fixed signal location. For the target tasks, the signal was randomly located within circles with radii . Detection-localization tasks were also considered. Here, the source task possessed two fixed signal locations, and tasks with fixed signal locations were considered target tasks. In these tasks, the signal was randomly located within one of the considered possible locations. The SLNN-NO was employed to compute the task component in Eq. (8) for network training. Only the last three convolutional layers were set to be trainable, and the weight parameter . For evaluations, SLNN-NOs were independently trained on different training datasets designed for target tasks. The SLNN-NO performance under the situations without task-shift was considered a reference. 4.3.4.Numerical observer computationBoth the HO and RHO were employed for objective image quality assessment because they are optimal linear observers. For computing the HO and RHO test statistics, the covariance matrix was empirically estimated by the use of 40,000 signal-present and 40,000 signal-absent images. When computing the RHO test statistic, the threshold parameter in Eq. (5) was swept from to , and the corresponding detection performance was estimated based on a separate validation dataset, including 200 signal-present images and 200 signal-absent images. The value that led to the best RHO detection performance was selected. For computing the CHO test statistic, 2000 signal-present and 2000 signal-absent images were utilized to empirically estimate the channelized covariance matrix. A set of 10 DOG channels29 was employed with channel parameters , , and . The internal noise level was 2.5, which was the same value employed by Abbey and Barrett.29 To independently train the SLNN-NO for objective image quality assessment, 40,000 signal-present images and 40,000 signal-absent images were employed. These learned-NOs were trained by use of the Adam optimizer41 with a learning rate of 0.0001. 4.3.5.Evaluation metricsBoth traditional and task-based measures of IQ were employed for assessments. Receiver operating characteristic (ROC) analysis was conducted, and area under the curve (AUC) values were computed and employed as a figure-of-merit for task-based measures. The ROC curves were fit by the use of the Metz-ROC software43 that employs the proper binormal model.44 The uncertainty of the AUC values was estimated as well. Two commonly used traditional metrics (i.e., RMSE and SSIM) were employed as task-agnostic measures to assess the denoised images. 5.ResultsIn Secs. 5.1 and 5.2, the results are presented to reveal the impact of the task-informed training method on the tradeoff between conventional and objective measures of image quality within the context of the considered tasks. In addition, in Sec. 5.3, the results are reported to investigate the impact of denoising on objective measures of IQ when task-shift was introduced. 5.1.Results for the Case with Fixed Signal LocationsThe task-informed training method was evaluated for SKS/BKS binary signal detection tasks in which known signal locations were considered. Here, these tasks were employed for both training and evaluation of the task-informed training method in which no task-shift was present. Section 5.1.1 describes the impact of the weight parameter on the performance of the employed NOs. In Sec. 5.1.2, the changes in the covariance matrix induced by the task-informed training method were also investigated to gain insights into the observer performance. 5.1.1.Impact of the weight parameterThe impact of the weight parameter in Eq. (8) on the signal detection performance as measured by AUC is shown in Fig. 3. Both the SLNN-NO and SLNN-HO described in Sec. 4.2.2 were considered to be the NO to compute task component in Eq. (8). The signal detection performance was evaluated by the use of the SLNN-NO, HO, RHO, and DOG-CHO acting on the denoised images. For both cases, the performance of the four different NOs on the denoised images was higher when larger values (larger weight for task-based loss) were considered. Those results confirm that the task-informed training method can improve the NO performance even when the NOs employed for objective image quality assessment were different from the NO used to compute during model training. Figure 3 yields two additional noteworthy findings. First, for the case in which the SLNN-HO was employed for training [panel (a)], the performance of the HO employed for objective image quality assessment was relatively high (statistically equivalent to that of the SLNN-NO and RHO). However, this relatively high HO performance was only observed for very large values (e.g., 0.99) in which SLNN-NO [panel (b)] was employed. The HO performance was much lower when relatively small values (i.e., 0.01-0.9) were employed. The RHO performance was employed as a reference and was relatively high for all cases. These observations suggest that, for the case in which SLNN-HO was employed for training, the second- and potentially higher-order statistical properties of the images were optimized to benefit the HO performance, but such behavior did not occur in the case in which SLNN-NO with small values was considered. Second, when SLNN-HO was used for training, the performance of DOG-CHO was greatly improved for large values and was not significantly improved for other cases. This observation indicates that the DOG channels were “closer” to efficient channels when was appropriately selected. However, this behavior was not observed in the case in which SLNN-NO was employed for training. 5.1.2.Changes in covariance matrix induced by the task-informed training methodTo gain insights into the behavior of the HO performance, the singular value spectra of the covariance matrices corresponding to the images denoised by task-informed training method were further examined. The results, shown in Fig. 4, reveal that the covariance matrix corresponding to the denoised images produced by the use of the pretrained denoising network was ill-conditioned, whereas that corresponding to the denoised images produced by the use of the fine-tuned denoising network was well-conditioned when SLNN-HO was employed to compute the task component in Eq. (8). However, for the case in which SLNN-NO was employed to compute , a similar observation only occurred for very large values (e.g., 0.99) in Eq. (8), and the covariance matrices were still ill-conditioned for small values [Fig. 4(b)]. The results of this analysis were consistent with the previously discussed results shown in Fig. 3 and indicated that the task-informed training method may improve the image statistics that are important for signal detection. 5.2.Results for the Case with Random Signal LocationsThe impact of the task-informed training method on the tradeoff between conventional and objective measures of IQ was investigated by considering SKS/BKS binary signal detection tasks with random signal locations for both training and evaluation. Section 5.2.1 describes the impact of the weight parameter and the number of trainable layers on the SLNN-NO performance. In Sec. 5.2.2, a study was also performed to investigate whether the loss of task-relevant information primarily occurs in the last several layers when the denoising network depths increase. 5.2.1.Impact of the weight parameter and number of trainable layersThe impacts of the weight parameter in Eq. (8) and the number of trainable layers on the signal detection performance as measured by the SLNN-NO are shown in Figs. 5 and 6, respectively. Here, the SLNN-NO employed to compute the task component in Eq. (8) was also employed to assess the signal detection performance. For comparison, the impact on traditional measures of IQ is demonstrated in Table 2. It was observed that, after the task-informed model training, the SLNN-NO signal detection performance was improved, whereas the traditional measures of IQ were degraded compared with those achieved by the pre-trained denoising network. Table 2Relationships between RMSE and the number of trainable layers in the denoising network and the weight parameter λ in LHybrid. The quantity Ntrain denotes the number of trainable layers. The values shown in the column related to Ntrain=0 represent cases in which RMSE and SSIM were calculated on images denoised by the pretrained non-task-informed denoising network. Additional details are provided in Sec. 5.2.1.
As shown in Fig. 5 and Table 2, for all numbers of trainable layers, the task performance increased as a function of . In addition, the degradation of traditional metrics was significant for relatively large but insignificant for small values (i.e., ). As expected, the tradeoff between traditional and task-based measures of IQ can be controlled by . For example, when , the resulting AUC value was greatly improved to that of , whereas the RMSE and SSIM were statistically equivalent to that of . As shown in Fig. 6, significant improvements in the task performance were achieved by fine-tuning the last (or last several) convolutional layer(s) (e.g., 0 to 2), whereas improvement was insignificant when more layers were trainable. For traditional IQ metrics (i.e., RMSE and SSIM), Table 2 shows that the degradation mainly resulted from the last small number of convolutional layer(s). For smaller values (i.e., ), the changes in traditional IQ metrics were statistically insignificant. The denoised estimates produced by the task-informed image denoising methods were also subjectively assessed. Figure 7 shows a noisy image and denoised images generated by denoising networks fine-tuned with the hybrid loss for different values of in Eq. (8). The denoised estimates were blurred as a result of the task-informed training, and the level of blur increased when increased. 5.2.2.Impact of denoising network depthA study was performed to investigate whether the loss of task-relevant information primarily occurs in the last several layers when the denoising network depths increase. As shown in Table 3, the SLNN-NO performance decreased as a function of denoising network depth, which is consistent with previous findings9 that the mantra “deep is better” may not always hold for objective IQ measures. After the task-informed training, the SLNN-NO performance was improved, and the variations of the improved SLNN-NO performance were statistically insignificant when network depth varied. No matter how deep the pretrained denoising network was, the loss of task-relevant information still occurred in the last certain layers (i.e., not related to the depth of denoising networks), at least in the considered cases. Table 3Relationship between signal detection performance achieved by the SLNN-NO and the depth D of the denoising networks. The SLNN-NO performance on images denoised by the denoising network trained with LHybrid and the pretrained non-task-informed denoising networks was quantified. For each of the denoising networks of varying depth, only the last three layers were fine-tuned with LHybrid. The standard error for AUC values is 0.003. The results indicated that the loss of task-relevant information only occurred in the last (or last several) layer(s), regardless of the depth of the denoising network.
5.3.Impact of the Task-Shifts for Training and Evaluation5.3.1.Test cases with different weight parametersThe robustness of the task-informed image denoising method to task-shift was also assessed, and the results are shown in Figs. 8 and 9. The SLNN-NO performance for the case with no task-shift was considered a reference. It was observed that introducing task-shift always degraded the task performance as expected and the degradation resulting from task-shift became insignificant when decreased. This is due to the smaller weight for the task-based loss component as decreases, which makes the impact of task-shift less significant. For the case in which a relatively simple task was used for training and the complex one was used for evaluation [Figs. 8(a) and 9(a)], the degradation in the task performance was much more significant than in the case in which a complex task was used for training and a simple one was used for evaluation [Figs. 8(b) and 9(b)]. For binary signal detection tasks shown in Fig. 8, this observation is due to the fact that the case with random signal locations can be easily generalized to the case with fixed signal locations but not vice versa. Similar findings were observed for the detection-localization tasks with both two and four possible signal locations, as shown in Fig. 9. It was found that, when the tasks with four and two possible locations were considered source/target tasks, less significant degradation in the task performance was observed when compared with the case in which the tasks with two and four possible locations were considered source/target tasks. This suggested that employing a relatively complex task for training can better improve the robustness of a task-informed image restoration method to task-shift than employing a simple task. 5.3.2.Test cases with gradually increased mismatches in source/target tasksAnother test case was performed to assess the robustness of the task-informed image denoising method to task-shift, and the results are shown in Fig. 10. The SLNN-NO performance for cases without task-shift was considered a reference. As expected, it was observed that introducing task-shift always degraded the task performance. As shown in Fig. 10(a), for binary signal detection tasks, the SLNN-NO performance decreased as a function of the radius of a circle where the signal was randomly located. In addition, the SLNN-NO performance gap between cases with and without task-shift increased when the radius became larger. Similar findings were also observed for the detection-localization tasks, as shown in Fig. 10(b). It was observed that the SLNN-NO performance decreased as a function of the number of possible signal locations. In addition, the SLNN-NO performance gap between cases with and without task-shift increased as a function of the number of possible signal locations. This suggests that, when employing the task-informed denoising method, the gap between potential task-shifts between training and evaluation needs to be carefully investigated. 6.Discussion and SummaryIn this work, a task-informed DNN-based image denoising method that preserves task-specific information was objectively evaluated. This study was motivated by previous works9,10 that indicated that traditional DNN-based denoising methods may not benefit the task performance even though the traditional measures of IQ were improved. The task-informed model training method employed a hybrid loss strategy and only acted on the last several layers of a DNN-based denoising method. To evaluate the method, binary signal detection tasks with fixed and random signal locations under SKS/BKS conditions were considered. The performance of SLNN-NO, SLNN-HO, and common NOs was quantified to assess the impact of task-informed training on task performance preservation. The numerical results indicated that certain tradeoffs can be achieved such that the resulting AUC value was significantly improved, and the degradation of physical IQ measures was statistically insignificant. The improvement in the signal detection performance of the considered NOs for evaluation can be explained by singular value spectra analysis. It was revealed that the considered task-informed transfer learning approach could mitigate the ill-conditioning of covariance matrices and has the potential to improve the image statistics that are important for signal detection. In addition, it was observed that significant improvements in the task performance were achieved by fine-tuning the last (or last several) convolutional layer(s), whereas the improvement was insignificant when more layers were trainable, which confirmed that the loss of task-relevant information occurred in the last certain layers, at least in the considered cases. To better understand the potential suitability of a task-informed image restoration method for clinical translation, its robustness to task-shift was also assessed. It was observed that introducing task-shift will degrade the task performance as expected. The degradation was significant when a relatively simple task was considered the source task, whereas a complex one was used as the target task. The degradation can be potentially mitigated by employing the complex task as the source task and the simple one as the target task. This suggests that employing a relatively complex task for training can better improve the robustness of a task-informed image restoration method to task-shift than employing a simple task. There remain numerous important topics for future investigation. In this work, the SLNN-NO and SLNN-HO were employed to compute the task component in Eq. (8). Anthropomorphic numerical observers (ANOs) may instead be employed to predict the human observer performance.29,45,46 Employing an ANO to compute may potentially benefit a task-informed image denoising method if humans are the ultimate readers of the image. The evaluation study in this paper focuses on significant parameters such as the weight parameter in Eq. (8) and the number of trainable layers. Other parameters, such as the size of training dataset and the ratio between signal-present and signal-absent images, remain unexplored. The extension of the proposed method for use with more complex tasks such as detection-estimation tasks32,47 is also an important topic. The task-informed image denoising method and the corresponding assessment strategy can also readily be applied to different image restoration and reconstruction methods. Ultimately, it will be critical to conduct human reader studies to assess the benefit of any task-informed method. DisclosuresThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Code and Data AvailabilityCode and data will be made publicly available upon acceptance of the paper. AcknowledgmentsThis work was supported in part by the National Institutes of Health [Grant Nos. EB031772 (sub-project 6366), EB034249, CA233873, DE033344, and CA287778]. This work has also been funded by the Jump ARCHES endowment through the Health Care Engineering Systems Center. Preliminary results of this work were presented at the 2022 SPIE Medical Imaging Conference and published as an SPIE Proceedings paper.48 ReferencesK. Gong et al.,
“PET image denoising using a deep neural network through fine tuning,”
IEEE Trans. Radiat. Plasma Med. Sci., 3
(2), 153
–161 https://doi.org/10.1109/TRPMS.2018.2877644
(2018).
Google Scholar
X. You et al.,
“Denoising of MR images with Rician noise using a wider neural network and noise range division,”
Magn. Reson. Imaging, 64 154
–159 https://doi.org/10.1016/j.mri.2019.05.042 MRIMDQ 0730-725X
(2019).
Google Scholar
A. Manduca et al.,
“Projection space denoising with bilateral filtering and CT noise modeling for dose reduction in CT,”
Med. Phys., 36
(11), 4911
–4919 https://doi.org/10.1118/1.3232004 MPHYA6 0094-2405
(2009).
Google Scholar
Z. Li et al.,
“Adaptive nonlocal means filtering based on local noise level for CT denoising,”
Med. Phys., 41
(1), 011908 https://doi.org/10.1118/1.4851635 MPHYA6 0094-2405
(2014).
Google Scholar
J.-W. Lin, A. F. Laine and S. R. Bergmann,
“Improving PET-based physiological quantification through methods of wavelet denoising,”
IEEE Trans. Biomed. Eng., 48
(2), 202
–212 https://doi.org/10.1109/10.909641 IEBEAX 0018-9294
(2001).
Google Scholar
A. Le Pogam et al.,
“Denoising of PET images by combining wavelets and curvelets for improved preservation of resolution and quantitation,”
Med. Image Anal., 17
(8), 877
–891 https://doi.org/10.1016/j.media.2013.05.005
(2013).
Google Scholar
H. Chen et al.,
“Low-dose CT denoising with convolutional neural network,”
in IEEE 14th Int. Symp. Biomed. Imaging (ISBI 2017),
143
–146
(2017). https://doi.org/10.1109/ISBI.2017.7950488 Google Scholar
H. H. Barrett and K. J. Myers, Foundations of Image Science, John Wiley & Sons(
(2013). Google Scholar
K. Li et al.,
“Assessing the impact of deep neural network-based image denoising on binary signal detection tasks,”
IEEE Trans. Med. Imaging, 40 2295
–2305 https://doi.org/10.1109/TMI.2021.3076810 ITMID4 0278-0062
(2021).
Google Scholar
Z. Yu et al.,
“AI-based methods for nuclear-medicine imaging: need for objective task-specific evaluation,”
J. Nucl. Med., 61
(Suppl. 1), 575
–575 JNMEAQ 0161-5505
(2020).
Google Scholar
K. Li et al.,
“Task-based performance evaluation of deep neural network-based image denoising,”
Proc. SPIE, 11599 115990L https://doi.org/10.1117/12.2582324 PSISDG 0277-786X
(2021).
Google Scholar
X. Zhang et al.,
“Impact of deep learning-based image super-resolution on binary signal detection,”
J. Med. Imaging, 8
(6), 065501 https://doi.org/10.1117/1.JMI.8.6.065501 JMEIET 0920-5497
(2021).
Google Scholar
J. Adler et al.,
“Task adapted reconstruction for inverse problems,”
Inverse Prob., 38 075006 https://doi.org/10.1088/1361-6420/ac28ec INPEEY 0266-5611
(2021).
Google Scholar
G. Ongie et al.,
“Optimizing model observer performance in learning-based CT reconstruction,”
Proc. SPIE, 12035 120350A https://doi.org/10.1117/12.2613050 PSISDG 0277-786X
(2022).
Google Scholar
M. Han, H. Shim and J. Baek,
“Low-dose CT denoising via convolutional neural network with an observer loss function,”
Med. Phys., 48
(10), 5727
–5742 https://doi.org/10.1002/mp.15161 MPHYA6 0094-2405
(2021).
Google Scholar
K. Li, H. Li and M. A. Anastasio,
“On the impact of incorporating task-information in learning-based image denoising,”
(2022). Google Scholar
J. Zhang et al.,
“Task-oriented low-dose CT image denoising,”
Lect. Notes Comput. Sci., 12906 441
–450 https://doi.org/10.1007/978-3-030-87231-1_43 LNCSD9 0302-9743
(2021).
Google Scholar
S. J. Pan and Q. Yang,
“A survey on transfer learning,”
IEEE Trans. Knowl. Data Eng., 22
(10), 1345
–1359 https://doi.org/10.1109/TKDE.2009.191 ITKEEH 1041-4347
(2009).
Google Scholar
K. Zhang et al.,
“Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising,”
IEEE Trans. Image Process., 26
(7), 3142
–3155 https://doi.org/10.1109/TIP.2017.2662206 IIPRE4 1057-7149
(2017).
Google Scholar
Q. Yang et al.,
“Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss,”
IEEE Trans. Med. Imaging, 37
(6), 1348
–1357 https://doi.org/10.1109/TMI.2018.2827462 ITMID4 0278-0062
(2018).
Google Scholar
W. Jifara et al.,
“Medical image denoising using convolutional neural network: a residual learning approach,”
J. Supercomput., 75
(2), 704
–718 https://doi.org/10.1007/s11227-017-2080-0
(2019).
Google Scholar
H. Chen et al.,
“Low-dose CT with a residual encoder-decoder convolutional neural network,”
IEEE Trans. Med. Imaging, 36
(12), 2524
–2535 https://doi.org/10.1109/TMI.2017.2715284 ITMID4 0278-0062
(2017).
Google Scholar
J. V. Manjón et al.,
“New methods for MRI denoising based on sparseness and self-similarity,”
Med. Image Anal., 16
(1), 18
–27 https://doi.org/10.1016/j.media.2011.04.003
(2012).
Google Scholar
J. M. Wolterink et al.,
“Generative adversarial networks for noise reduction in low-dose CT,”
IEEE Trans. Med. Imaging, 36
(12), 2536
–2545 https://doi.org/10.1109/TMI.2017.2708987 ITMID4 0278-0062
(2017).
Google Scholar
P. Khurd and G. Gindi,
“Decision strategies that maximize the area under the LROC curve,”
IEEE Trans. Med. Imaging, 24
(12), 1626
–1636 https://doi.org/10.1109/TMI.2005.859210 ITMID4 0278-0062
(2005).
Google Scholar
H. H. Barrett et al.,
“Model observers for assessment of image quality,”
Proc. Natl. Acad. Sci. U. S. A., 90
(21), 9758
–9765 https://doi.org/10.1073/pnas.90.21.9758
(1993).
Google Scholar
R. A. Fisher,
“The use of multiple measurements in taxonomic problems,”
Ann. Eugen., 7
(2), 179
–188 https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
(1936).
Google Scholar
K. J. Myers and H. H. Barrett,
“Addition of a channel mechanism to the ideal-observer model,”
J. Opt. Soc. Am. A, 4
(12), 2447
–2457 https://doi.org/10.1364/JOSAA.4.002447 JOAOD6 0740-3232
(1987).
Google Scholar
C. K. Abbey and H. H. Barrett,
“Human-and model-observer performance in ramp-spectrum noise: effects of regularization and object variability,”
J. Opt. Soc. Am. A, 18
(3), 473
–488 https://doi.org/10.1364/JOSAA.18.000473 JOAOD6 0740-3232
(2001).
Google Scholar
W. Zhou, H. Li and M. A. Anastasio,
“Approximating the ideal observer and Hotelling observer for binary signal detection tasks by use of supervised learning methods,”
IEEE Trans. Med. Imaging, 38
(10), 2456
–2468 https://doi.org/10.1109/TMI.2019.2911211 ITMID4 0278-0062
(2019).
Google Scholar
W. Zhou, H. Li and M. A. Anastasio,
“Approximating the ideal observer for joint signal detection and localization tasks by use of supervised learning methods,”
IEEE Trans. Med. Imaging, 39 3992
–4000 https://doi.org/10.1109/TMI.2020.3009022 ITMID4 0278-0062
(2020).
Google Scholar
K. Li et al.,
“A hybrid approach for approximating the ideal observer for joint signal detection and estimation tasks by use of supervised learning and Markov-Chain Monte Carlo methods,”
IEEE Trans. Med. Imaging, 41 1114
–1124 https://doi.org/10.1109/TMI.2021.3135147 ITMID4 0278-0062
(2021).
Google Scholar
K. Li et al.,
“Supervised learning-based ideal observer approximation for joint detection and estimation tasks,”
Proc. SPIE, 11599 115990F https://doi.org/10.1117/12.2582327 PSISDG 0277-786X
(2021).
Google Scholar
K. Li et al.,
“Estimating task-based performance bounds for image reconstruction methods by use of learned-ideal observers,”
Proc. SPIE, 12467 124670I https://doi.org/10.1117/12.2655241 PSISDG 0277-786X
(2023).
Google Scholar
K. Li et al.,
“Application of learned ideal observers for estimating task-based performance bounds for computed imaging systems,”
J. Med. Imaging, 11
(2), 026002 https://doi.org/10.1117/1.JMI.11.2.026002 JMEIET 0920-5497
(2024).
Google Scholar
S. Sengupta et al.,
“Investigation of adversarial robust training for establishing interpretable CNN-based numerical observers,”
Proc. SPIE, 12035 1203514 https://doi.org/10.1117/12.2613220 PSISDG 0277-786X
(2022).
Google Scholar
III S. G. Armato et al.,
“The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans,”
Med Phys, 38
(2), 915
–931 https://doi.org/10.1118/1.3528204 MPHYA6 0094-2405
(2011).
Google Scholar
A. Pezeshk et al.,
“Seamless insertion of real pulmonary nodules in chest CT exams,”
Proc. SPIE, 9035 90351K https://doi.org/10.1117/12.2043786 PSISDG 0277-786X
(2014).
Google Scholar
M. Ronchetti,
“TorchRadon: fast differentiable routines for computed tomography,”
(2020). Google Scholar
A. C. Kak and M. Slaney, Principles of Computerized Tomographic Imaging, SIAM(
(2001). Google Scholar
D. P. Kingma and J. Ba,
“Adam: a method for stochastic optimization,”
(2014). Google Scholar
C. Metz, ROC-Kit User’s Guide, University of Chicago, Chicago
(1998). Google Scholar
L. L. Pesce and C. E. Metz,
“Reliable and computationally efficient maximum-likelihood estimation of “proper” binormal ROC curves,”
Acad. Radiol., 14
(7), 814
–829 https://doi.org/10.1016/j.acra.2007.03.012
(2007).
Google Scholar
Y. Zhang, B. T. Pham and M. P. Eckstein,
“The effect of nonlinear human visual system components on performance of a channelized Hotelling observer in structured backgrounds,”
IEEE Trans. Med. Imaging, 25
(10), 1348
–1362 https://doi.org/10.1109/TMI.2006.880681 ITMID4 0278-0062
(2006).
Google Scholar
M. Han and J. Baek,
“A convolutional neural network-based anthropomorphic model observer for signal-known-statistically and background-known-statistically detection tasks,”
Phys. Med. Biol., 65
(22), 225025 https://doi.org/10.1088/1361-6560/abbf9d
(2020).
Google Scholar
E. Clarkson,
“Estimation receiver operating characteristic curve and ideal observers for combined detection/estimation tasks,”
J. Opt. Soc. Am. A, 24
(12), B91
–B98 https://doi.org/10.1364/JOSAA.24.000B91 JOAOD6 0740-3232
(2007).
Google Scholar
K. Li, H. Li and M. A. Anastasio,
“A task-informed model training method for deep neural network-based image denoising,”
Proc. SPIE, 12035 1203510 https://doi.org/10.1117/12.2613181 PSISDG 0277-786X
(2022).
Google Scholar
BiographyKaiyan Li received his BE degree in telecommunications engineering from Xidian University, Xi’an, China, in 2019. He was with the Department of Bioengineering at the University of Illinois at Urbana–Champaign (UIUC). He is now with Meta Platform Inc. His research interests include task-based image quality assessment, deep learning, and imaging science. He is also a member of SPIE. Hua Li is a professor in the Department of Radiation Oncology at Washington University in St. Louis, St. Louis, Missouri, USA. Her research work focuses on task-based image quality assessment, deep-learning, and development of innovative medical imaging and image analysis techniques to solve the challenges seen in clinical practice, toward improving personalized patient care. Mark A. Anastasio is the Donald Biggar Willett Professor in Engineering and the head of the Department of Bioengineering at the UIUC. He is a fellow of the Institute of Electrical and Electronic Engineers (IEEE), the Society of Photo-Optical Instrumentation Engineers (SPIE), the American Institute for Medical and Biological Engineering (AIMBE), and the International Academy of Medical and Biological Engineering (IAMBE). His research addresses computational image science, inverse problems in imaging, and machine learning for imaging applications. He has contributed to emerging biomedical imaging technologies, including photoacoustic computed tomography and ultrasound computed tomography. |
Education and training
Signal detection
Denoising
Image quality
Machine learning
Binary data
Lithium