» Articles » PMID: 39794337

Local Kernel Renormalization As a Mechanism for Feature Learning in Overparametrized Convolutional Neural Networks

Overview
Journal Nat Commun
Date 2025 Jan 10
PMID 39794337
Authors
Affiliations
Soon will be listed here.
Abstract

Empirical evidence shows that fully-connected neural networks in the infinite-width limit (lazy training) eventually outperform their finite-width counterparts in most computer vision tasks; on the other hand, modern architectures with convolutional layers often achieve optimal performances in the finite-width regime. In this work, we present a theoretical framework that provides a rationale for these differences in one-hidden-layer networks; we derive an effective action in the so-called proportional limit for an architecture with one convolutional hidden layer and compare it with the result available for fully-connected networks. Remarkably, we identify a completely different form of kernel renormalization: whereas the kernel of the fully-connected architecture is just globally renormalized by a single scalar parameter, the convolutional kernel undergoes a local renormalization, meaning that the network can select the local components that will contribute to the final prediction in a data-dependent way. This finding highlights a simple mechanism for feature learning that can take place in overparametrized shallow convolutional neural networks, but not in shallow fully-connected architectures or in locally connected neural networks without weight sharing.

References
1.
Mei S, Montanari A, Nguyen P . A mean field view of the landscape of two-layer neural networks. Proc Natl Acad Sci U S A. 2018; 115(33):E7665-E7671. PMC: 6099898. DOI: 10.1073/pnas.1806579115. View

2.
Hanin B, Zlokapa A . Bayesian interpolation with deep linear networks. Proc Natl Acad Sci U S A. 2023; 120(23):e2301345120. PMC: 10266010. DOI: 10.1073/pnas.2301345120. View

3.
Seroussi I, Naveh G, Ringel Z . Separation of scales and a thermodynamic description of feature learning in some CNNs. Nat Commun. 2023; 14(1):908. PMC: 9938275. DOI: 10.1038/s41467-023-36361-y. View

4.
Ingrosso A, Goldt S . Data-driven emergence of convolutional structure in neural networks. Proc Natl Acad Sci U S A. 2022; 119(40):e2201854119. PMC: 9546588. DOI: 10.1073/pnas.2201854119. View

5.
Baglioni P, Pacelli R, Aiudi R, Di Renzo F, Vezzani A, Burioni R . Predictive Power of a Bayesian Effective Action for Fully Connected One Hidden Layer Neural Networks in the Proportional Limit. Phys Rev Lett. 2024; 133(2):027301. DOI: 10.1103/PhysRevLett.133.027301. View