Abstract
Autonomous drone visual navigation in indoor environments remains a challenging problem due to the lack of reliable localization signals and the complexity of dynamic surroundings. Developing robust and cost-effective navigation solutions is therefore essential for real-world inspection and monitoring applications. This work presents a novel, low-cost visual navigation model named navCLIP, for indoor environments. navCLIP builds on a pre-trained CLIP (contrastive language-image pre-training) model for vision-text encoding and adds an MLP (multilayer perceptron) projection layer. This layer is trained via contrastive learning to align: 1) an ensemble encoder (monocular depth estimation, semantic segmentation, and a scene description); and 2) discrete action texts (e.g., 'move forward' and 'rotate left/right'). During inference, navCLIP uses cosine similarity to select the best action for drone navigation. The proposed navCLIP model was evaluated in a realistic simulated indoor environment created with AirSim and Unreal Engine to represent the corridor area of the SDAIA-KFUPM JRCAI at KFUPM. The results indicate that navCLIP outperforms baseline models such as CNNs (Convolutional Neural Network) in terms of recognition accuracy in selecting the best action based on the collected data. Moreover, when reinforcement-learning agents navigate within the same environment, navCLIP provides better collision avoidance compared with other reinforcement-learning methods like DQN (Deep Q-Network) and enhanced-DQN agents.
| Original language | English |
|---|---|
| Pages (from-to) | 78122-78132 |
| Number of pages | 11 |
| Journal | IEEE Access |
| Volume | 14 |
| DOIs | |
| State | Published - 2026 |
Bibliographical note
Publisher Copyright:© 2013 IEEE.
Keywords
- CLIP fine-tuning
- GPS denied indoor drones
- Visual navigation
- collision avoidance
- contrastive learning
ASJC Scopus subject areas
- General Computer Science
- General Materials Science
- General Engineering
Fingerprint
Dive into the research topics of 'NavCLIP: A Novel CLIP-Guided Drone Navigation Model'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver