Recently, transformer self-attention mechanisms have had significant advantages in the deep learning field, and it has been extensively used for natural language processing and video tracking. Furthermore, self-attention mechanisms have also been applied in hyperspectral unmixing. Although self-attention mechanisms are usually efficient and flexible tools, the original transformer might break the inner structure of data during learning, causing negative effects to unmixing. In this work, we employ transformer self-attention mechanisms to achieve a deep self-embedded transformer network(DSET-Net) for hyperspectral unmixing. The proposed DSET-Net adopts an autoencoder framework and achieves local and overall feature parameter sharing in the encoder through a 'Transformer in Transformer (TNT)' strategy. The DSET-Net preserves the spatial details of hyperspectral images and involves only one convolution operation in the encoder, substantially improving the learning performance. The effectiveness of the proposed method is evaluated by using real hyperspectral datasets. Our experimental results indicate that the newly proposed DSET-Net is very competitive compared with other state-of-the-art approaches.
|