Efficient Visual Representation Learning with Heat Conduction Equation

Zhemin Zhang; Xun Gong

doi:10.24963/ijcai.2025/271

Efficient Visual Representation Learning with Heat Conduction Equation

Zhemin Zhang, Xun Gong

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Main Track. Pages 2431-2439. https://doi.org/10.24963/ijcai.2025/271

PDF BibTeX

Foundation models, such as CNNs and ViTs, have powered the development of image representation learning. However, general guidance to model architecture design is still missing. Inspired by the connection between image representation learning and heat conduction, we model images by the heat conduction equation, where the essential idea is to conceptualize image features as temperatures and model their information interaction as the diffusion of thermal energy. Based on this idea, we find that many modern model architectures, such as residual structures, SE block, and feed-forward networks, can be interpreted from the perspective of the heat conduction equation. Therefore, we leverage the heat equation to design new and more interpretable models. As an example, we propose the Heat Conduction Layer and the Refinement Approximation Layer inspired by solving the heat conduction equation using Finite Difference Method and Fourier series, respectively. The main goal of this paper is to integrate the overall architectural design of neural networks into the theoretical framework of heat conduction. Nevertheless, our Heat Conduction Network (HcNet) still shows competitive performance, e.g., HcNet-T achieves 83.0% top-1 accuracy on ImageNet-1K while only requiring 28M parameters and 4.1G MACs. The code is publicly available at: https://github.com/ZheminZhang1/HcNet.

Keywords:

Computer Vision: CV: Recognition (object detection, categorization)

Machine Learning: ML: Deep learning architectures