» Articles » PMID: 36359709

Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery

Overview
Journal Entropy (Basel)
Publisher MDPI
Date 2022 Nov 11
PMID 36359709
Authors
Affiliations
Soon will be listed here.
Abstract

Convolutional neural networks have long dominated semantic segmentation of very-high-resolution (VHR) remote sensing (RS) images. However, restricted by the fixed receptive field of convolution operation, convolution-based models cannot directly obtain contextual information. Meanwhile, Swin Transformer possesses great potential in modeling long-range dependencies. Nevertheless, Swin Transformer breaks images into patches that are single-dimension sequences without considering the position loss problem inside patches. Therefore, Inspired by Swin Transformer and Unet, we propose SUD-Net (Swin transformer-based Unet-like with Dynamic attention pyramid head Network), a new U-shaped architecture composed of Swin Transformer blocks and convolution layers simultaneously through a dual encoder and an upsampling decoder with a Dynamic Attention Pyramid Head (DAPH) attached to the backbone. First, we propose a dual encoder structure combining Swin Transformer blocks and reslayers in reverse order to complement global semantics with detailed representations. Second, aiming at the spatial loss problem inside each patch, we design a Multi-Path Fusion Model (MPFM) with specially devised Patch Attention (PA) to encode position information of patches and adaptively fuse features of different scales through attention mechanisms. Third, a Dynamic Attention Pyramid Head is constructed with deformable convolution to dynamically aggregate effective and important semantic information. SUD-Net achieves exceptional results on ISPRS Potsdam and Vaihingen datasets with 92.51%mF1, 86.4%mIoU, 92.98%OA, 89.49%mF1, 81.26%mIoU, and 90.95%OA, respectively.

Citing Articles

Fault diagnosis in electric motors using multi-mode time series and ensemble transformers network.

Xu B, Li H, Ding R, Zhou F Sci Rep. 2025; 15(1):7834.

PMID: 40050311 PMC: 11885605. DOI: 10.1038/s41598-025-89695-6.

References
1.
Samie A, Abbas A, Azeem M, Hamid S, Iqbal M, Hasan S . Examining the impacts of future land use/land cover changes on climate in Punjab province, Pakistan: implications for environmental sustainability and economic growth. Environ Sci Pollut Res Int. 2020; 27(20):25415-25433. DOI: 10.1007/s11356-020-08984-x. View

2.
Zhang X, Yang Y, Li Z, Ning X, Qin Y, Cai W . An Improved Encoder-Decoder Network Based on Strip Pool Method Applied to Segmentation of Farmland Vacancy Field. Entropy (Basel). 2021; 23(4). PMC: 8068146. DOI: 10.3390/e23040435. View

3.
Moghalles K, Li H, Alazeb A . Weakly Supervised Building Semantic Segmentation Based on Spot-Seeds and Refinement Process. Entropy (Basel). 2022; 24(5). PMC: 9141811. DOI: 10.3390/e24050741. View