PurposeSynthetic datasets hold the potential to offer cost-effective alternatives to clinical data, ensuring privacy protections and potentially addressing biases in clinical data. We present a method leveraging such datasets to train a machine learning algorithm applied as part of a computer-aided detection (CADe) system.ApproachOur proposed approach utilizes clinically acquired computed tomography (CT) scans of a physical anthropomorphic phantom into which manufactured lesions were inserted to train a machine learning algorithm. We treated the training database obtained from the anthropomorphic phantom as a simplified representation of clinical data and increased the variability in this dataset using a set of randomized and parameterized augmentations. Furthermore, to mitigate the inherent differences between phantom and clinical datasets, we investigated adding unlabeled clinical data into the training pipeline.ResultsWe apply our proposed method to the false positive reduction stage of a lung nodule CADe system in CT scans, in which regions of interest containing potential lesions are classified as nodule or non-nodule regions. Experimental results demonstrate the effectiveness of the proposed method; the system trained on labeled data from physical phantom scans and unlabeled clinical data achieves a sensitivity of 90% at eight false positives per scan. Furthermore, the experimental results demonstrate the benefit of the physical phantom in which the performance in terms of competitive performance metric increased by 6% when a training set consisting of 50 clinical CT scans was enlarged by the scans obtained from the physical phantom.ConclusionsThe scalability of synthetic datasets can lead to improved CADe performance, particularly in scenarios in which the size of the labeled clinical data is limited or subject to inherent bias. Our proposed approach demonstrates an effective utilization of synthetic datasets for training machine learning algorithms.
Synthetic datasets hold the potential to serve as cost-effective alternatives to clinical data, potentially aiding in mitigating the biases in clinical data. This paper presents a novel method that utilizes such datasets to train a computer-aided detection (CADe) algorithm. Our proposed approach uses images of a physical anthropomorphic phantom into which manufactured objects representing simplified lesions were inserted, followed by a set of randomized and parametrized augmentations of the data to increase the variability in these datasets. By incorporating these augmentations into the training phase, our proposed method aims to add variability within training datasets of limited size to improve model performance. We apply our proposed method to the false positive reduction stage of a lung nodule CADe system on computed tomography (CT) scans. Our experimental results demonstrate the effectiveness of the proposed method, where the network performance in terms of the Competitive Performance Metric (CPM) increased by 6% when a training set consisting of 50 clinical CT scans was augmented by scans obtained from a physical phantom database.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.