Duplicate Questions Pair Detection Using Siamese MaLSTM

  • Zainab Imtiaz
  • , Muhammad Umer
  • , Muhammad Ahmad
  • , Saleem Ullah
  • , Gyu Sang Choi
  • , Arif Mehmood*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

66 Scopus citations

Abstract

Quora is a growing platform comprising a user generated collection of questions and answers. The questions and answers are created, edited, and organized by the users. Enormous number of users on the Quora website makes it unavoidable to have multiple questions from different users with similar intent, which raises the issue of duplicate questions. Effectively detecting duplicate questions would make it easier to find high quality answers and help save time, which in turn would result in an improved user experience for writers and readers on Quora. In this paper, Quora Question Pairs dataset is collected from Kaggle for detection of duplicate questions. First, three types of word embeddings involving Google news vector embedding, FastText crawl embedding with 300 dimensions, and FastText crawl sub words embedding with 300 dimensions are implemented individually to vectorize all the questions and train the model. The final features used for prediction are blend of these three types of word embeddings. Then, Siamese MaLSTM ('Ma' for Manhattan distance) Neural Network model is applied for prediction of duplicate questions in the dataset. Finally, the model is tested on 100000 pairs of questions. The experiments show that the proposed model achieves 91.14% accuracy which is better than the state-of-the-art models.

Original languageEnglish
Article number8967103
Pages (from-to)21932-21942
Number of pages11
JournalIEEE Access
Volume8
DOIs
StatePublished - 2020
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2020 IEEE.

Keywords

  • Duplicate question pair detection
  • MaLSTM
  • deep learning
  • text mining
  • word embedding

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Duplicate Questions Pair Detection Using Siamese MaLSTM'. Together they form a unique fingerprint.

Cite this