Abstract
Learning high-quality text embedding is vital for biomedical topic classification and many other NLP tasks. Contrastive learning has shown remarkable performance in generating high-quality text embeddings. However, existing methods typically generate anchor-positive pairs through discrete augmentations, simplifying the task of distinguishing positive from negative examples and limiting the learning of meaningful representations. In this paper, we present a self-supervised segment contrastive learning (SCL) approach designed for contrastively fine-tuning pre-trained language models. Our method randomly divides documents into anchor and positive segments, facilitating the learning of document embeddings by maximizing agreement between these segments. The proposed model contrastively fine-tune pre-trained ClinicalBioBERT language model to generate document embedding for medical documents. We evaluate our method on two publicly available medical datasets, MIMIC and Bioasq. Extensive experiments show that our proposed SCL approach outperforms baseline models, achieving superior performance in medical classification tasks.
| Original language | English |
|---|---|
| Title of host publication | Artificial Intelligence in Medicine - 22nd International Conference, AIME 2024, Proceedings |
| Editors | Joseph Finkelstein, Robert Moskovitch, Enea Parimbelli |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 312-321 |
| Number of pages | 10 |
| ISBN (Print) | 9783031665370 |
| DOIs | |
| State | Published - 2024 |
| Externally published | Yes |
| Event | 22nd International Conference on Artificial Intelligence in Medicine, AIME 2024 - Salt Lake City, United States Duration: 9 Jul 2024 → 12 Jul 2024 |
Publication series
| Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Volume | 14844 LNAI |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 22nd International Conference on Artificial Intelligence in Medicine, AIME 2024 |
|---|---|
| Country/Territory | United States |
| City | Salt Lake City |
| Period | 9/07/24 → 12/07/24 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
Keywords
- Contrastive learning
- Document representation
- Language models
- Medical text
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science
Fingerprint
Dive into the research topics of 'Self-supervised Segment Contrastive Learning for Medical Document Representation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver