Abstract
Diabetic retinopathy is a leading cause of vision loss, necessitating early, accurate detection. Automated deep learning models show promise but struggle with the complexity of retinal images and limited labeled data. Due to domain differences, traditional transfer learning from datasets like ImageNet often fails in medical imaging. Self-supervised learning (SSL) offers a solution by enabling models to learn directly from medical data, but its success depends on the backbone architecture. Convolutional Neural Networks (CNNs) focus on local features, which can be limiting. To address this, we propose the Multi-scale Self-Supervised Learning (MsSSL) model, combining Vision Transformers (ViTs) for global context and CNNs with a Feature Pyramid Network (FPN) for multi-scale feature extraction. These features are refined through a Deep Learner module, improving spatial resolution and capturing high-level and fine-grained information. The MsSSL model significantly enhances DR grading, outperforming traditional methods, and underscores the value of domain-specific pretraining and advanced model integration in medical imaging.
| Original language | English |
|---|---|
| Article number | 33742 |
| Journal | Scientific Reports |
| Volume | 15 |
| Issue number | 1 |
| DOIs | |
| State | Published - Dec 2025 |
Bibliographical note
Publisher Copyright:© The Author(s) 2025.
Keywords
- CBAM
- Diabetic retinopathy grading
- Feature pyramid network
- Self-supervised learning
- Vision transformer
ASJC Scopus subject areas
- General