Letz Translate: Low-Resource Machine Translation for Luxembourgish

  • Yewei Song*
  • , Saad Ezzini*
  • , Jacques Klein*
  • , Tegawende Bissyande*
  • , Clément Lefebvre
  • , Anne Goujon
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Natural language processing of Low-Resource Languages (LRL) is often challenged by the lack of data. Therefore, achieving accurate machine translation (MT) in a low-resource environment is a real problem that requires practical solutions. Research in multilingual models have shown that some LRLs can be handled with such models. However, their large size and computational needs make their use in constrained environments (e.g., mobile/IoT devices or limited/old servers) impractical. In this paper, we address this problem by leveraging the power of large multilingual MT models using knowledge distillation. Knowledge distillation can transfer knowledge from a large and complex teacher model to a simpler and smaller student model without losing much in performance. We also make use of high-resource languages that are related or share the same linguistic root as the target LRL. For our evaluation, we consider Luxembourgish as the LRL that shares some roots and properties with German. We build multiple resource-efficient models based on German, knowledge distillation from the multilingual No Language Left Behind (NLLB) model, and pseudo-translation. We find that our efficient models are more than 30% faster and perform only 4% lower compared to the large state-of-the-art NLLB model.

Original languageEnglish
Title of host publicationProceedings - 2023 5th International Conference on Natural Language Processing, ICNLP 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages165-170
Number of pages6
ISBN (Electronic)9798350302219
DOIs
StatePublished - 2023
Externally publishedYes
Event5th International Conference on Natural Language Processing, ICNLP 2023 - Hybrid, Guangzhou, China
Duration: 24 Mar 202326 Mar 2023

Publication series

NameProceedings - 2023 5th International Conference on Natural Language Processing, ICNLP 2023

Conference

Conference5th International Conference on Natural Language Processing, ICNLP 2023
Country/TerritoryChina
CityHybrid, Guangzhou
Period24/03/2326/03/23

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Keywords

  • Knowledge distillation
  • Low-resource Languages
  • Low-resource Translation
  • Luxembourgish
  • Neural Machine Translation

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Signal Processing

Fingerprint

Dive into the research topics of 'Letz Translate: Low-Resource Machine Translation for Luxembourgish'. Together they form a unique fingerprint.

Cite this