Abstract
In the modern landscape of Natural Language Processing (NLP), intelligent chatbots like ChatGPT 3.5 and Google's Bard have shown remarkable competence in generic question-answering (QA) tasks. However, their performance falters when navigating domain-specific QA, particularly in the Arabic language, which is celebrated for its complex morphology and syntax. This paper presents a comprehensive approach to address these issues. The aim of this research is to build a chatbot tailored for a university community. We first create an extensive Arabic Q&A dataset by extracting data from academic documents, employing state-of-the-art Optical Character Recognition (OCR) tools. Then, we evaluate multiple text similarity measures like Pooled FastText Word embedding, BM25 ranking functions, and various semantic sentence embedding models. A thorough performance assessment reveals that the domain-specific model excels at both sentence-level similarity and context-relevance tasks. The developed web application chatbot, leveraging LangChain library and Retrieval Augmented Generation (RAG) methods, outperforms existing chatbots in domain-specific, Arabic language QA scenarios.
| Original language | English |
|---|---|
| Title of host publication | ICFNDS 2023 - 2023 The 7th International Conference on Future Networks and Distributed Systems |
| Publisher | Association for Computing Machinery |
| Pages | 366-371 |
| Number of pages | 6 |
| ISBN (Electronic) | 9798400709036 |
| DOIs | |
| State | Published - 21 Dec 2023 |
| Event | 7th International Conference on Future Networks and Distributed Systems, ICFNDS 2023 - Dubai, United Arab Emirates Duration: 21 Dec 2023 → 22 Dec 2023 |
Publication series
| Name | ACM International Conference Proceeding Series |
|---|
Conference
| Conference | 7th International Conference on Future Networks and Distributed Systems, ICFNDS 2023 |
|---|---|
| Country/Territory | United Arab Emirates |
| City | Dubai |
| Period | 21/12/23 → 22/12/23 |
Bibliographical note
Publisher Copyright:© 2023 ACM.
Keywords
- Information Retrieval
- Natural Language Processing
ASJC Scopus subject areas
- Human-Computer Interaction
- Computer Networks and Communications
- Computer Vision and Pattern Recognition
- Software