Building Rooftop Extraction from High Resolution Aerial Images Using Multiscale Global Perceptron with Spatial Context Refinement

Overview

Journal Sci Rep

Specialty Science

Date 2025 Feb 22

PMID 39987354

Authors

Qinglie Yuan

Affiliations

Soon will be listed here.

Abstract

Building rooftop extraction has been applied in various fields, such as cartography, urban planning, automatic driving, and intelligent city construction. Automatic building detection and extraction algorithms using high spatial resolution aerial images can provide precise location and geometry information, significantly reducing time, costs, and labor. Recently, deep learning algorithms, especially convolution neural networks (CNNs) and Transformer, have robust local or global feature extraction ability, achieving advanced performance in intelligent interpretation compared with conventional methods. However, buildings often exhibit scale variation, spectral heterogeneity, and similarity with complex geometric shapes. Hence, the building rooftop extraction results exist fragmentation and lack spatial details using these methods. To address these issues, this study developed a multi-scale global perceptron network based on Transformer and CNN using novel encoder-decoders for enhancing contextual representation of buildings. Specifically, an improved multi-head-attention encoder is employed by constructing multi-scale tokens to enhance global semantic correlations. Meanwhile, the context refinement decoder is developed and synergistically uses high-level semantic representation and shallow features to restore spatial details. Overall, quantitative analysis and visual experiments confirmed that the proposed model is more efficient and superior to other state-of-the-art methods, with a 95.18% F1 score on the WHU dataset and a 93.29% F1 score on the Massub dataset.

References

Badrinarayanan V, Kendall A, Cipolla R . SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans Pattern Anal Mach Intell. 2017; 39(12):2481-2495. DOI: 10.1109/TPAMI.2016.2644615. View

Shi Y, Li Q, Zhu X . Building segmentation through a gated graph convolutional neural network with deep structured feature embedding. ISPRS J Photogramm Remote Sens. 2020; 159:184-197. PMC: 6946440. DOI: 10.1016/j.isprsjprs.2019.11.004. View

Zhou Z, Siddiquee M, Tajbakhsh N, Liang J . UNet++: A Nested U-Net Architecture for Medical Image Segmentation. Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018). 2020; 11045:3-11. PMC: 7329239. DOI: 10.1007/978-3-030-00889-5_1. View

Li M, Rui J, Yang S, Liu Z, Ren L, Ma L . Method of Building Detection in Optical Remote Sensing Images Based on SegFormer. Sensors (Basel). 2023; 23(3). PMC: 9920730. DOI: 10.3390/s23031258. View