Enhancing Arabic Information Retrieval for Question Answering

  • Muath Alghamdi
  • , Mohammed Abushawarib
  • , Mahmoud Ellouh
  • , Mustafa Ghaleb
  • , Muhamad Felemban

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

In the modern landscape of Natural Language Processing (NLP), intelligent chatbots like ChatGPT 3.5 and Google's Bard have shown remarkable competence in generic question-answering (QA) tasks. However, their performance falters when navigating domain-specific QA, particularly in the Arabic language, which is celebrated for its complex morphology and syntax. This paper presents a comprehensive approach to address these issues. The aim of this research is to build a chatbot tailored for a university community. We first create an extensive Arabic Q&A dataset by extracting data from academic documents, employing state-of-the-art Optical Character Recognition (OCR) tools. Then, we evaluate multiple text similarity measures like Pooled FastText Word embedding, BM25 ranking functions, and various semantic sentence embedding models. A thorough performance assessment reveals that the domain-specific model excels at both sentence-level similarity and context-relevance tasks. The developed web application chatbot, leveraging LangChain library and Retrieval Augmented Generation (RAG) methods, outperforms existing chatbots in domain-specific, Arabic language QA scenarios.

Original languageEnglish
Title of host publicationICFNDS 2023 - 2023 The 7th International Conference on Future Networks and Distributed Systems
PublisherAssociation for Computing Machinery
Pages366-371
Number of pages6
ISBN (Electronic)9798400709036
DOIs
StatePublished - 21 Dec 2023
Event7th International Conference on Future Networks and Distributed Systems, ICFNDS 2023 - Dubai, United Arab Emirates
Duration: 21 Dec 202322 Dec 2023

Publication series

NameACM International Conference Proceeding Series

Conference

Conference7th International Conference on Future Networks and Distributed Systems, ICFNDS 2023
Country/TerritoryUnited Arab Emirates
CityDubai
Period21/12/2322/12/23

Bibliographical note

Publisher Copyright:
© 2023 ACM.

Keywords

  • Information Retrieval
  • Natural Language Processing

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Fingerprint

Dive into the research topics of 'Enhancing Arabic Information Retrieval for Question Answering'. Together they form a unique fingerprint.

Cite this