» Articles » PMID: 39771950

Fusion of Visible and Infrared Aerial Images from Uncalibrated Sensors Using Wavelet Decomposition and Deep Learning

Overview
Journal Sensors (Basel)
Publisher MDPI
Specialty Biotechnology
Date 2025 Jan 8
PMID 39771950
Authors
Affiliations
Soon will be listed here.
Abstract

Multi-modal systems extract information about the environment using specialized sensors that are optimized based on the wavelength of the phenomenology and material interactions. To maximize the entropy, complementary systems operating in regions of non-overlapping wavelengths are optimal. VIS-IR (Visible-Infrared) systems have been at the forefront of multi-modal fusion research and are used extensively to represent information in all-day all-weather applications. Prior to image fusion, the image pairs have to be properly registered and mapped to a common resolution palette. However, due to differences in the device physics of image capture, information from VIS-IR sensors cannot be directly correlated, which is a major bottleneck for this area of research. In the absence of camera metadata, image registration is performed manually, which is not practical for large datasets. Most of the work published in this area assumes calibrated sensors and the availability of camera metadata providing registered image pairs, which limits the generalization capability of these systems. In this work, we propose a novel end-to-end pipeline termed for image registration and fusion. Firstly, we design a recursive crop and scale wavelet spectral decomposition (WSD) algorithm for automatically extracting the patch of visible data representing the thermal information. After data extraction, both the images are registered to a common resolution palette and forwarded to the DNN for image fusion. The fusion performance of the proposed pipeline is compared and quantified with state-of-the-art classical and DNN architectures for open-source and custom datasets demonstrating the efficacy of the pipeline. Furthermore, we also propose a novel keypoint-based metric for quantifying the quality of fused output.

References
1.
Zhang H, Zhang L, Zhuo L, Zhang J . Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning. Sensors (Basel). 2020; 20(2). PMC: 7014199. DOI: 10.3390/s20020393. View

2.
Xu H, Ma J, Jiang J, Guo X, Ling H . U2Fusion: A Unified Unsupervised Image Fusion Network. IEEE Trans Pattern Anal Mach Intell. 2020; 44(1):502-518. DOI: 10.1109/TPAMI.2020.3012548. View

3.
Negishi T, Abe S, Matsui T, Liu H, Kurosawa M, Kirimoto T . Contactless Vital Signs Measurement System Using RGB-Thermal Image Sensors and Its Clinical Screening Test on Patients with Seasonal Influenza. Sensors (Basel). 2020; 20(8). PMC: 7218727. DOI: 10.3390/s20082171. View

4.
Shahsavarani S, Lopez F, Ibarra-Castanedo C, Maldague X . Robust Multi-Modal Image Registration for Image Fusion Enhancement in Infrastructure Inspection. Sensors (Basel). 2024; 24(12). PMC: 11207383. DOI: 10.3390/s24123994. View

5.
Li S, Kang X, Hu J . Image fusion with guided filtering. IEEE Trans Image Process. 2013; 22(7):2864-75. DOI: 10.1109/TIP.2013.2244222. View