Skip to main navigation Skip to search Skip to main content

Towards Personalised Audio Visual Speech Enhancement

Research output: Contribution to journalConference articlepeer-review

Abstract

Personalised speech enhancement (PSE) and audio-visual (AV) speech enhancement (SE) have emerged as promising approaches to improve speech quality and intelligibility in challenging acoustic environments. PSE leverages individual-specific vocal characteristics to address the label permutation problem, while AV SE incorporates visual cues, particularly lip movements, to complement auditory signals in noisy conditions where speech is degraded by competing noise sources. This paper presents a novel framework that unifies these two, advancing towards personalised AV SE. By integrating raw enrolment audio for adaptive target speaker representation with AV inputs the proposed system aims to achieve robust SE in real-world environments. Experimental results demonstrate significant improvements in speech intelligibility and noise suppression on the COG-MHEAR Audio-Visual Speech Enhancement Challenge dataset, outperforming state-of-the-art PSE and AV SE models.

Original languageEnglish
Pages (from-to)4853-4857
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
StatePublished - 2025
Externally publishedYes
Event26th Interspeech Conference 2025 - Rotterdam, Netherlands
Duration: 17 Aug 202521 Aug 2025

Bibliographical note

Publisher Copyright:
© 2025 International Speech Communication Association. All rights reserved.

Keywords

  • multimodal processing
  • speech enhancement
  • speech separation

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Language and Linguistics
  • Modeling and Simulation
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Towards Personalised Audio Visual Speech Enhancement'. Together they form a unique fingerprint.

Cite this