Skip to main navigation Skip to search Skip to main content

Lightweight Adaptive Deep Learning for Efficient Real-Time Speech Enhancement on Edge Devices

  • Fazal E. Wahab
  • , Zhongfu Ye*
  • , Nasir Saleem
  • , Sami Bourouis
  • , Amir Hussain
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Deep learning has significantly advanced speech enhancement (SE) by exploiting hierarchical representations to model complex speech patterns. However, deploying these models on resource-constrained edge devices remains challenging due to computational limitations and real-time processing requirements. Convolutional neural networks (CNNs) face challenges due to frequency translation equivariance, which reduces their sensitivity to frequency-specific features essential for speech-noise separation. Transformer-based SE models are effective at capturing global dependencies but are computationally expensive and less suitable for low-latency edge processing. This study proposes an efficient encoder-decoder architecture optimized for SE on edge devices to address these challenges. The model integrates adaptive frequency-aware gated convolution (AFAGC) in the encoder and a Ginformer-based bottleneck, ensuring robust real-time performance with minimal computational overhead. The encoder incorporates adaptive frequency band positional encoding to mitigate translation equivariance, while gated convolution selectively reweights frequency components to emphasize speech-relevant features. The Ginformer-based bottleneck uses low-rank projections to reduce self-attention complexity and an SRU-based temporal gating to enhance noise adaptation and computational efficiency. Evaluation on the VoiceBank+DEMAND dataset demonstrates that the proposed model outperforms recent SE models, achieving a PESQ of 3.25 and STOI of 95.5%. With only 1.32 million parameters and a real-time factor (RTF) of 0.14, it delivers high-quality speech enhancement suitable for real-time deployment on edge devices.

Original languageEnglish
Pages (from-to)12086-12095
Number of pages10
JournalIEEE Transactions on Consumer Electronics
Volume71
Issue number4
DOIs
StatePublished - 2025
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 1975-2011 IEEE.

Keywords

  • Speech enhancement
  • adaptive deep learning
  • edge devices
  • real-time processing

ASJC Scopus subject areas

  • Media Technology
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Lightweight Adaptive Deep Learning for Efficient Real-Time Speech Enhancement on Edge Devices'. Together they form a unique fingerprint.

Cite this