Histopathology whole-slide image (WSI) captures detailed structural and morphological features of tumor tissue, offering rich histological and molecular information to support clinical practice. With the development of artificial intelligence, deep learning (DL) methods have emerged to assist in automatically analyzing histopathology WSIs. It alleviates the need for tedious, time-consuming, and error-prone inspections by clinicians. Up to now, employing DL models for histopathology WSI analysis is still challenging due to the intrinsic complexity of histology characteristics of tumor tissue, high image resolution, and large image size. In this study, we proposed a transformer-based classifier with feature aggregation for cancer subtype classification using histopathology WSIs while addressing these challenges. Our method shows three advantages to improve classification performance. First, an aggregate transformer decoder is employed to learn both global and local features from WSIs. Second, the transformer architecture facilitates the decoder to learn spatial correlations among different regions in a WSI. Third, the self-attention mechanism of the transformer facilitates the generation of saliency maps to highlight regions of interest in WSIs. We evaluated our model on three cancer subtype classification tasks and demonstrated its effectiveness and performance.
Medical image classification plays a vital role in disease diagnosis, tumor staging, and various clinical applications. Deep learning (DL) methods have become increasingly popular for medical image classification. However, medical images have unique characteristics that pose challenges for training DL-based models, including limited annotated data, imbalanced distribution of classes, and large variations in lesion structures. Self-supervised learning (SSL) methods have emerged as a promising solution to alleviate these issues through directly learning useful representations from large-scale unlabeled data. In this study, a new generative self-supervised learning method based on the StyleGAN generator is proposed for medical image classification. The style generator, pretrained on large-scale unlabeled data, is integrated into the classification framework to effectively extract style features that encapsulate essential semantic information from input images through image reconstruction. The extracted style feature serves as an auxiliary regularization term to leverage knowledge learned from unlabeled data to support the training of the classification network and enhance model performance. To enable efficient feature fusion, a self-attention module is designed for this integration of the style generator and classification framework, dynamically focusing on important feature elements related to classification performance. Additionally, a sequential training strategy is designed to train the classification model on a limited number of labeled images while leveraging large-scale unlabeled data to improve classification performance. The experimental results on a chest X-ray image dataset demonstrate superior classification performance and robustness compared to traditional DL-based methods. The effectiveness and potential of the model were discussed as well.
Accurate classification of medical images is crucial for disease diagnosis and treatment planning. Deep learning (DL) methods have gained increasing attention in this domain. However, DL-based classification methods encounter challenges due to the unique characteristics of medical image datasets, including limited amounts of labeled images and large image variations. Self-supervised learning (SSL) has emerged as a solution that learns informative representations from unlabeled data to alleviate the scarcity of labeled images and improve model performance. A recently proposed generative SSL method, masked autoencoder (MAE), has shown excellent capability in feature representation learning. The MAE model trained on unlabeled data can be easily tuned to improve the performance of various downstream classification models. In this paper, we performed a preliminary study to integrate MAE with the self-attention mechanism for tumor classification on breast ultrasound (BUS) data. Considering the speckle noise, image quality variations of BUS images, and varying tumor shapes and sizes, two revisions were adopted in using MAE for tumor classification. First, MAE’s patch size and masking ratio were adjusted to avoid missing information embedded in small lesions on BUS images. Second, attention maps were extracted to improve the interpretability of the model’s decision-making process. Experiments demonstrated the effectiveness and potential of the MAE-based classification model on small labeled datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.