Message Passing Neural Networks for Sound Source Localization

Published in 33rd Telecommunications Forum TELFOR, 2025

Sound source localization (SSL) is a fundamental problem in audio signal processing. Traditional SSL approaches often degrade in challenging acoustic conditions, while more recent deep learning-based methods struggle to generalize to unseen microphone arrays or to fully exploit the relationships between microphones in an array. We propose Graph-RelNet, a graph-based neural network that extends relation networks with residual graph convolutional layers to regress the direc- tion of arrival (DoA) from generalized cross-correlation with phase transform (GCC-PHAT) features. Experiments on the TIMIT speech corpus demonstrate that Graph-RelNet consis- tently outperforms a regression-based adaptation of GNN-SSL, while also surpassing conventional steered response power (SRP) with delay-and-sum (D&S) beamforming at medium and high signal-to-noise ratios (SNRs), especially in scenarios with few microphones. At 5 microphones and SNR = 30 dB, Graph-RelNet reduces mean absolute DoA error by up to 45%. The model also shows strong generalization to unseen microphone arrays and remains competitive under low-SNR conditions.

M. Marijan, M. Bjelić, "Message Passing Neural Networks for Sound Source Localization" 33rd Telecommunications Forum TELFOR.