Abstract
Deep learning has significantly advanced speech enhancement (SE) by exploiting hierarchical representations to model complex speech patterns. However, deploying these models on resource-constrained edge devices remains challenging due to computational limitations and real-time processing requirements. Convolutional neural networks (CNNs) face challenges due to frequency translation equivariance, which reduces their sensitivity to frequency-specific features essential for speech-noise separation. Transformer-based SE models are effective at capturing global dependencies but are computationally expensive and less suitable for low-latency edge processing. This study proposes an efficient encoder-decoder architecture optimized for SE on edge devices to address these challenges. The model integrates adaptive frequency-aware gated convolution (AFAGC) in the encoder and a Ginformer-based bottleneck, ensuring robust real-time performance with minimal computational overhead. The encoder incorporates adaptive frequency band positional encoding to mitigate translation equivariance, while gated convolution selectively reweights frequency components to emphasize speech-relevant features. The Ginformer-based bottleneck uses low-rank projections to reduce self-attention complexity and an SRU-based temporal gating to enhance noise adaptation and computational efficiency. Evaluation on the VoiceBank+DEMAND dataset demonstrates that the proposed model outperforms recent SE models, achieving a PESQ of 3.25 and STOI of 95.5%. With only 1.32 million parameters and a real-time factor (RTF) of 0.14, it delivers high-quality speech enhancement suitable for real-time deployment on edge devices.
| Original language | English |
|---|---|
| Pages (from-to) | 12086-12095 |
| Number of pages | 10 |
| Journal | IEEE Transactions on Consumer Electronics |
| Volume | 71 |
| Issue number | 4 |
| DOIs | |
| State | Published - 2025 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 1975-2011 IEEE.
Keywords
- Speech enhancement
- adaptive deep learning
- edge devices
- real-time processing
ASJC Scopus subject areas
- Media Technology
- Electrical and Electronic Engineering
Fingerprint
Dive into the research topics of 'Lightweight Adaptive Deep Learning for Efficient Real-Time Speech Enhancement on Edge Devices'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver