Arabic Compact Language Modelling for Resource Limited Devices

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Natural language modelling has gained a lot of interest recently. The current state-of-theart results are achieved by first training a very large language model and then fine-tuning it on multiple tasks. However, there is little work on smaller more compact language models for resource-limited devices or applications. Not to mention, how to efficiently train such models for a low-resource language like Arabic. In this paper, we investigate how such models can be trained in a compact way for Arabic. We also show how distillation and quantization can be applied to create even smaller models. Our experiments show that our largest model which is 2x smaller than the baseline can achieve better results on multiple tasks with 2x less data for pretraining.

Original languageEnglish
Title of host publicationWANLP 2021 - 6th Arabic Natural Language Processing Workshop, Proceedings of the Workshop
EditorsNizar Habash, Houda Bouamor, Hazem Hajj, Walid Magdy, Wajdi Zaghouani, Fethi Bougares, Nadi Tomeh, Ibrahim Abu Farha, Samia Touileb
PublisherAssociation for Computational Linguistics (ACL)
Pages53-59
Number of pages7
ISBN (Electronic)9781954085091
StatePublished - 2021

Publication series

NameWANLP 2021 - 6th Arabic Natural Language Processing Workshop, Proceedings of the Workshop

Bibliographical note

Publisher Copyright:
© WANLP 2021 - 6th Arabic Natural Language Processing Workshop

ASJC Scopus subject areas

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Software
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Arabic Compact Language Modelling for Resource Limited Devices'. Together they form a unique fingerprint.

Cite this