A machine learning approach to extract keyphrases from bengali document using CNN-Bidirectional LSTM

Nishat Tasnim Ahmed Meem, Muhammad Mahir Hasan Chowdhury, Md Mahfuzur Rahman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Keyphrases are single or multiple word phrases of a document which describe the principal topics of that document. These keyphrases help readers to get an overview of the document. In this paper, we proposed a system that uses the combination of Convolutional Neural Network and Bidirectional Long Short-Term Memory (BiLSTM) Recurrent Neural Network (RNN) to automatically detect keyphrases from a document. We also used some preprocessing steps to clean and generate candidates keyphrases to train the model. Convolutional Neural Network can analyze semantic meanings of sentences. Bidirectional LSTM can learn the relations among words in the sentences. A Bengali pre-trained word embedding is used in this work.

Original languageEnglish
Title of host publication2019 22nd International Conference on Computer and Information Technology, ICCIT 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728158426
DOIs
StatePublished - Dec 2019
Externally publishedYes

Publication series

Name2019 22nd International Conference on Computer and Information Technology, ICCIT 2019

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

Keywords

  • BiL- STM
  • CNN
  • Convolutional Neural Network
  • FastText
  • Keyphrase Extraction
  • Neural Network
  • RNN
  • Word Embedding

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Fingerprint

Dive into the research topics of 'A machine learning approach to extract keyphrases from bengali document using CNN-Bidirectional LSTM'. Together they form a unique fingerprint.

Cite this