Regular breast screening with mammography allows for early detection of cancer and reduces breast cancer mortality. However, significant false positive and false negative rates leave opportunities for improving diagnostic accuracy. Computer-aided detection (CAD) softwares have been available to radiologists for decades to address these issues. However, traditional CAD products have failed to improve interpretation of full-field digital mammography (FFDM) images in clinical practice due to low sensitivity and a large number of false positives per image. Usage of deep learning models have shown promise in improving performance of radiologists. Unfortunately, they still have a large amount of false positives per images at reasonable sensitivities. In this work, we propose a simple and intuitive two-stage detection framework, named WRDet. WRDet consists of two stages: a region proposal network that has been optimized to enhance sensitivity and a second-stage patch classifier that boosts specificity. We highlight different rules for matching predicted proposals and ground truth boxes that are commonly used in the mammography CAD literature and compare these rules in light of the high variability in quality of ground truth annotations of mammography datasets. We additionally propose a new criterion to match predicted proposals with loose bounding box annotations that is useful for two-stage CAD systems like WRDet. Using the common CAD matching criterion that considers a prediction true positive if its center falls within the ground truth annotation, our system achieves an overall sensitivity of 81.3% and 89.4% at 0.25 and 1 false positive mark per image, respectively. For the task of mass detection, we achieve a sensitivity of 85.3% and 92% at 0.25 and 1 false positive mark per image, respectively. We also compare our results with select models reported in literature using different matching criteria. Our results demonstrate the possibility of a CAD system that could be beneficial in improving accuracy of screening mammography worldwide.
As deep learning greatly accelerates the field of computer vision, there has been growing interest in applying deep learning models for the purpose of predicting the presence of cancer in mammography images. However, unlike in conventional object recognition where one can leverage very large diverse datasets such as ImageNet, datasets for identifying cancer with mammography images are typically small and potentially non-representative due to the high cost of acquiring medical data and labels. This makes the training and assessment of such models challenging and raises reliability as well as generalizability concerns. In this work, we propose using the jigsaw task1 as a self-supervised method to pre-train models in the case where unlabeled data is available. We show that models that are pre-trained with this task outperform randomly initialized models even when they are only trained on a half or a quarter of the train set for the malignancy prediction task. In particular, we find that when using only a quarter of the labeled data, model trained using randomly initialized weights has an area under the receiver operating characteristic curve (AUC) of 0.944. On the other hand, the model that was pre-trained with the jigsaw task achieved an AUC of 0.958 when fine-tuned on the same quarter of the training set for the malignancy prediction task, outperforming even the model that was trained on all of the labeled data starting from randomized weights (0.954 AUC). Furthermore, we propose using performance on the jigsaw task as a way to measure confidence in our model’s predictions to enable the option to abstain from making a prediction when the model is not confident. We tested multiple strategies to filter out samples on which the jigsaw model perform poorly and measured the AUC in the remaining pool of samples. We show that the best filtering strategy improves malignancy prediction performance from an AUC of 0.890 on a completely unfiltered, off-site test set from a different country to an AUC of 0.913 on the filtered set.
Mammography-based screening has helped reduce the breast cancer mortality rate, but has also been associated with potential harms due to low specificity, leading to unnecessary exams or procedures, and low sensitivity. Digital breast tomosynthesis (DBT) improves on conventional mammography by increasing both sensitivity and specificity and is becoming common in clinical settings. However, deep learning (DL) models have been developed mainly on conventional 2D full-field digital mammography (FFDM) or scanned film images. Due to a lack of large annotated DBT datasets, it is difficult to train a model on DBT from scratch. In this work, we present methods to generalize a model trained on FFDM images to DBT images. In particular, we use average histogram matching (HM) and DL fine-tuning methods to generalize a FFDM model to the 2D maximum intensity projection (MIP) of DBT images. In the proposed approach, the differences between the FFDM and DBT domains are reduced via HM and then the base model, which was trained on abundant FFDM images, is fine-tuned. When evaluating on image patches extracted around identified findings, we are able to achieve similar areas under the receiver operating characteristic curve (ROC AUC) of ~ 0:9 for FFDM and ~ 0:85 for MIP images, as compared to a ROC AUC of ~ 0:75 when tested directly on MIP images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.