Local Kernel Renormalization As a Mechanism for Feature Learning in Overparametrized Convolutional Neural Networks

Overview

Journal Nat Commun

Date 2025 Jan 10

PMID 39794337

Authors

R Aiudi

R Pacelli

P Baglioni

A Vezzani

R Burioni

P Rotondo

Affiliations

Soon will be listed here.

Abstract

Empirical evidence shows that fully-connected neural networks in the infinite-width limit (lazy training) eventually outperform their finite-width counterparts in most computer vision tasks; on the other hand, modern architectures with convolutional layers often achieve optimal performances in the finite-width regime. In this work, we present a theoretical framework that provides a rationale for these differences in one-hidden-layer networks; we derive an effective action in the so-called proportional limit for an architecture with one convolutional hidden layer and compare it with the result available for fully-connected networks. Remarkably, we identify a completely different form of kernel renormalization: whereas the kernel of the fully-connected architecture is just globally renormalized by a single scalar parameter, the convolutional kernel undergoes a local renormalization, meaning that the network can select the local components that will contribute to the final prediction in a data-dependent way. This finding highlights a simple mechanism for feature learning that can take place in overparametrized shallow convolutional neural networks, but not in shallow fully-connected architectures or in locally connected neural networks without weight sharing.

References

Mei S, Montanari A, Nguyen P . A mean field view of the landscape of two-layer neural networks. Proc Natl Acad Sci U S A. 2018; 115(33):E7665-E7671. PMC: 6099898. DOI: 10.1073/pnas.1806579115. View

Hanin B, Zlokapa A . Bayesian interpolation with deep linear networks. Proc Natl Acad Sci U S A. 2023; 120(23):e2301345120. PMC: 10266010. DOI: 10.1073/pnas.2301345120. View

Seroussi I, Naveh G, Ringel Z . Separation of scales and a thermodynamic description of feature learning in some CNNs. Nat Commun. 2023; 14(1):908. PMC: 9938275. DOI: 10.1038/s41467-023-36361-y. View

Ingrosso A, Goldt S . Data-driven emergence of convolutional structure in neural networks. Proc Natl Acad Sci U S A. 2022; 119(40):e2201854119. PMC: 9546588. DOI: 10.1073/pnas.2201854119. View

Baglioni P, Pacelli R, Aiudi R, Di Renzo F, Vezzani A, Burioni R . Predictive Power of a Bayesian Effective Action for Fully Connected One Hidden Layer Neural Networks in the Proportional Limit. Phys Rev Lett. 2024; 133(2):027301. DOI: 10.1103/PhysRevLett.133.027301. View