» Articles » PMID: 39123941

MixImages: An Urban Perception AI Method Based on Polarization Multimodalities

Overview
Journal Sensors (Basel)
Publisher MDPI
Specialty Biotechnology
Date 2024 Aug 10
PMID 39123941
Authors
Affiliations
Soon will be listed here.
Abstract

Intelligent urban perception is one of the hot topics. Most previous urban perception models based on semantic segmentation mainly used RGB images as unimodal inputs. However, in natural urban scenes, the interplay of light and shadow often leads to confused RGB features, which diminish the model's perception ability. Multimodal polarization data encompass information dimensions beyond RGB, which can enhance the representation of shadow regions, serving as additional data for assistance. Additionally, in recent years, transformers have achieved outstanding performance in visual tasks, and their large, effective receptive field can provide more discriminative cues for shadow regions. For these reasons, this study proposes a novel semantic segmentation model called MixImages, which can combine polarization data for pixel-level perception. We conducted comprehensive experiments on a polarization dataset of urban scenes. The results showed that the proposed MixImages can achieve an accuracy advantage of 3.43% over the control group model using only RGB images in the unimodal benchmark while gaining a performance improvement of 4.29% in the multimodal benchmark. Additionally, to provide a reference for specific downstream tasks, we also tested the impact of different combinations of polarization types on the overall segmentation accuracy. The proposed MixImages can be a new option for conducting urban scene perception tasks.

References
1.
Xiang K, Yang K, Wang K . Polarization-driven semantic segmentation via efficient attention-bridged fusion. Opt Express. 2021; 29(4):4802-4820. DOI: 10.1364/OE.416130. View

2.
Samie A, Abbas A, Azeem M, Hamid S, Iqbal M, Hasan S . Examining the impacts of future land use/land cover changes on climate in Punjab province, Pakistan: implications for environmental sustainability and economic growth. Environ Sci Pollut Res Int. 2020; 27(20):25415-25433. DOI: 10.1007/s11356-020-08984-x. View

3.
Zhou Z, Siddiquee M, Tajbakhsh N, Liang J . UNet++: A Nested U-Net Architecture for Medical Image Segmentation. Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018). 2020; 11045:3-11. PMC: 7329239. DOI: 10.1007/978-3-030-00889-5_1. View

4.
Wang Q, Chen W, Tang H, Pan X, Zhao H, Yang B . Simultaneous extracting area and quantity of agricultural greenhouses in large scale with deep learning method and high-resolution remote sensing images. Sci Total Environ. 2023; 872:162229. DOI: 10.1016/j.scitotenv.2023.162229. View

5.
Badrinarayanan V, Kendall A, Cipolla R . SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans Pattern Anal Mach Intell. 2017; 39(12):2481-2495. DOI: 10.1109/TPAMI.2016.2644615. View