CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuous Sign Language Recognition

Sarah Alyami, Hamzah Luqman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Continuous sign language recognition (CSLR) focuses on interpreting and transcribing sequences of sign language gestures in videos. In this work, we propose CLIP sign language adaptation (CLIP-SLA), a novel CSLR framework that leverages the powerful pre-trained visual encoder from the CLIP model to sign language tasks through parameterefficient fine-tuning (PEFT). We introduce two variants, SLA-Adapter and SLA-LoRA, which integrate PEFT modules into the CLIP visual encoder, enabling fine-tuning with minimal trainable parameters. The effectiveness of the proposed frameworks is validated on four datasets: Phoenix2014, Phoenix2014-T, CSL-Daily, and Isharah500, where both CLIP-SLA variants outperformed several SOTA models with fewer trainable parameters. Extensive ablation studies emphasize the effectiveness and flexibility of the proposed methods with different vision-language models for CSLR. These findings showcase the potential of adapting large-scale pre-trained models for scalable and efficient CSLR, which pave the way for future advancements in sign language understanding. Code is available at https://github.com/snalyami/CLIP-SLA.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025
PublisherIEEE Computer Society
Pages4098-4108
Number of pages11
ISBN (Electronic)9798331599942
DOIs
StatePublished - 2025
Event2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025 - Nashville, United States
Duration: 11 Jun 202512 Jun 2025

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Conference

Conference2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025
Country/TerritoryUnited States
CityNashville
Period11/06/2512/06/25

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

Keywords

  • continuous sign language recognition
  • gesture recognition
  • parameter-efficient fine-tuning
  • sign language recognition
  • vision-language models

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuous Sign Language Recognition'. Together they form a unique fingerprint.

Cite this