» Articles » PMID: 39987354

Building Rooftop Extraction from High Resolution Aerial Images Using Multiscale Global Perceptron with Spatial Context Refinement

Overview
Journal Sci Rep
Specialty Science
Date 2025 Feb 22
PMID 39987354
Authors
Affiliations
Soon will be listed here.
Abstract

Building rooftop extraction has been applied in various fields, such as cartography, urban planning, automatic driving, and intelligent city construction. Automatic building detection and extraction algorithms using high spatial resolution aerial images can provide precise location and geometry information, significantly reducing time, costs, and labor. Recently, deep learning algorithms, especially convolution neural networks (CNNs) and Transformer, have robust local or global feature extraction ability, achieving advanced performance in intelligent interpretation compared with conventional methods. However, buildings often exhibit scale variation, spectral heterogeneity, and similarity with complex geometric shapes. Hence, the building rooftop extraction results exist fragmentation and lack spatial details using these methods. To address these issues, this study developed a multi-scale global perceptron network based on Transformer and CNN using novel encoder-decoders for enhancing contextual representation of buildings. Specifically, an improved multi-head-attention encoder is employed by constructing multi-scale tokens to enhance global semantic correlations. Meanwhile, the context refinement decoder is developed and synergistically uses high-level semantic representation and shallow features to restore spatial details. Overall, quantitative analysis and visual experiments confirmed that the proposed model is more efficient and superior to other state-of-the-art methods, with a 95.18% F1 score on the WHU dataset and a 93.29% F1 score on the Massub dataset.

References
1.
Badrinarayanan V, Kendall A, Cipolla R . SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans Pattern Anal Mach Intell. 2017; 39(12):2481-2495. DOI: 10.1109/TPAMI.2016.2644615. View

2.
Shi Y, Li Q, Zhu X . Building segmentation through a gated graph convolutional neural network with deep structured feature embedding. ISPRS J Photogramm Remote Sens. 2020; 159:184-197. PMC: 6946440. DOI: 10.1016/j.isprsjprs.2019.11.004. View

3.
Zhou Z, Siddiquee M, Tajbakhsh N, Liang J . UNet++: A Nested U-Net Architecture for Medical Image Segmentation. Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018). 2020; 11045:3-11. PMC: 7329239. DOI: 10.1007/978-3-030-00889-5_1. View

4.
Li M, Rui J, Yang S, Liu Z, Ren L, Ma L . Method of Building Detection in Optical Remote Sensing Images Based on SegFormer. Sensors (Basel). 2023; 23(3). PMC: 9920730. DOI: 10.3390/s23031258. View