Skip to main navigation Skip to search Skip to main content

Self-supervised Segment Contrastive Learning for Medical Document Representation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Learning high-quality text embedding is vital for biomedical topic classification and many other NLP tasks. Contrastive learning has shown remarkable performance in generating high-quality text embeddings. However, existing methods typically generate anchor-positive pairs through discrete augmentations, simplifying the task of distinguishing positive from negative examples and limiting the learning of meaningful representations. In this paper, we present a self-supervised segment contrastive learning (SCL) approach designed for contrastively fine-tuning pre-trained language models. Our method randomly divides documents into anchor and positive segments, facilitating the learning of document embeddings by maximizing agreement between these segments. The proposed model contrastively fine-tune pre-trained ClinicalBioBERT language model to generate document embedding for medical documents. We evaluate our method on two publicly available medical datasets, MIMIC and Bioasq. Extensive experiments show that our proposed SCL approach outperforms baseline models, achieving superior performance in medical classification tasks.

Original languageEnglish
Title of host publicationArtificial Intelligence in Medicine - 22nd International Conference, AIME 2024, Proceedings
EditorsJoseph Finkelstein, Robert Moskovitch, Enea Parimbelli
PublisherSpringer Science and Business Media Deutschland GmbH
Pages312-321
Number of pages10
ISBN (Print)9783031665370
DOIs
StatePublished - 2024
Externally publishedYes
Event22nd International Conference on Artificial Intelligence in Medicine, AIME 2024 - Salt Lake City, United States
Duration: 9 Jul 202412 Jul 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14844 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Conference on Artificial Intelligence in Medicine, AIME 2024
Country/TerritoryUnited States
CitySalt Lake City
Period9/07/2412/07/24

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Keywords

  • Contrastive learning
  • Document representation
  • Language models
  • Medical text

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Self-supervised Segment Contrastive Learning for Medical Document Representation'. Together they form a unique fingerprint.

Cite this