Abstract
Speech enhancement aims to improve the quality and intelligibility of speech by reducing or removing various noises and distortions while preserving the natural characteristics of the underlying speech. Traditional convolutional encoder–decoder architectures often rely on static convolutional layers with fixed kernel sizes, limiting their adaptability to varying noisy environments. In this work, we propose a dynamic multi-kernel convolutional network, a convolutional encoder–decoder framework that incorporates an attention mechanism and noise-aware feature processing to adaptively adjust to different noise characteristics and speech features. The model dynamically modifies its convolutional kernels based on input features and estimates noise characteristics to enhance robustness under diverse noise conditions. Experiments were conducted on multiple publicly available datasets, including Texas Instruments MIT (TIMIT), Wall Street Journal Speaker-Independent 84 (WSJ0-SI84), and VoiceBank+DEMAND. The proposed model achieves STOI improvements of 17.62 % and PESQ gains of 1.18 on TIMIT, and PESQ of 3.15 with STOI of 95.1 % on VoiceBank+DEMAND, outperforming several benchmark methods. Furthermore, the model maintains low computational complexity with 2.73 million parameters, 1.99 G MACs/s, and a real-time factor of 0.21, making it suitable for real-time deployment. While demonstrating strong performance, the evaluation is currently limited to publicly available datasets and moderate noise conditions; future work will explore highly adverse real-world environments.
| Original language | English |
|---|---|
| Article number | 132420 |
| Journal | Neurocomputing |
| Volume | 669 |
| DOIs | |
| State | Published - 7 Mar 2026 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2025 Elsevier B.V.
Keywords
- Attentive feature fusion
- Dynamic convolution
- Feature learning
- Multi-kernel convolution
- Speech enhancement
ASJC Scopus subject areas
- Computer Science Applications
- Cognitive Neuroscience
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'Dynamic multi-kernel convolutional network with noise injected features for audio-only speech enhancement'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver