Deep learning-based methods have achieved significant improvement in accuracy in diagnosing lung diseases utilizing Chest X-Ray. However, their black-box nature and lack of interpretability reduce the confidence among physicians in the reliability of machine-generated decisions, which consistently limits their application in clinical practice. In this paper, we propose a novel interpretable deep learning model VProtoNet, which can produce heatmaps that display important diagnostic image features of lung diseases and reveal how the model makes decision based on them. VProtoNet generates heatmaps by comparing the features extracted by Vision Transformer with the prototypes, each of which signifies a typical part of a Chest X-ray image, learned within the model. Further, we simplify the heatmap into a single similarity score that can be used as the basis for model classification diagnosis. To verify the effectiveness of our model, we applied our method to Chest X-ray 14 dataset and achieved an accuracy of 72.35%. Also, we analyzed the feature maps generated by our model during the classification process, discovering that they indeed intuitively demonstrate the model's recognition and understanding of the diseased areas, which enables physicians to better comprehend the model's decision-making process.
KEYWORDS: Error control coding, Semantics, Data modeling, Performance modeling, Error analysis, Education and training, Data corrections, Feature extraction, Deep learning
Text correction aims to determine whether natural language text contains grammatical errors, text errors, etc., and correct sentences. Previous work usually adopts byte pair encoding (BPE), which may lead to semantically related Chinese characters being separated. In addition, the previous models can only extract the superficial semantic features, but cannot capture the global deep semantic relationships. In this paper, we introduce the time convolutional network (TCN) to capture multi-scale semantic information, so as to promote the development of global semantic information. The CTC_BERT model uses the synonym masking strategy to reduce the semantic segmentation of related words and adds a fully connection layer as the error detection layer. In order to verify the performance of the CTC_BERT model, a comparison experiment has been carried out on SIGHAN2015+Wang271K Chinese error correction dataset. The results show that the accuracy of this model can reach 81.4%, which is better than that of BERT, BART, ConvSeq2Seq and other conventional models, and effectively improves the performance of text error correction.
A feature attention based multi-stage network for image deblurring is proposed. The feature attention module is introduced into the model. This module is composed of channel attention and pixel attention mechanism. More attention is focused on the blurred pixels and important channel information, solving the problem of uneven blurred distribution in images effectively. We also introduce the atrous residual block and the context module between the encoder and the decoder. The atrous convolution are combined with the residual, and the context module adopts the multi-layer atrous convolution, which effectively increase the receptive field of the network and better capture the multi-scale contextual information. Experiments were conducted on the public dataset GoPro to evaluate the performance of our method. The results show that the PSNR of the proposed model reaches 30.51, and the processing speed reaches 0.035s, which outperform that of the most current deblurring methods.
Extractive reading comprehension is to extract consecutive subsequences from a given article to answer the given question. Previous work often adopted Byte Pair Encoding (BPE) that could cause semantically correlated words to be separated. Also, previous features extraction strategy cannot effectively capture the global semantic information. In this paper, an extractive summarization model is proposed with enhanced spatial-temporal information and span mask encoding (ESSM) to promote global semantic information. ESSM utilizes Embedding Layer to reduce semantic segmentation of correlated words, and adopts TemporalConvNet Layer to relief the loss of feature information. The model can also deal with unanswerable questions. To verify the effectiveness of the model, experiments on datasets SQuAD1.1 and SQuAD2.0 are conducted. Our model achieved an EM of 86.31% and a F1 score of 92.49% on SQuAD1.1 and the numbers are 80.54% and 83.27% for SQuAD2.0. It was proved that the model is effective for extractive QA task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.