Semantic annotation of images is the process of assigning metadata in the form of captions to a digital image. This is an important process for the indexing and searching of images in a big database. In this paper, we present the framework of automatic semantic annotation of images and explore the effectiveness of using it to annotate images based on both the image’s low-level features and its surrounding text. In the proposed framework, the image’s features have been extracted by using convolutional neural networks, while words in the surrounding text have been represented by word-embedding vectors. Both modalities are further processed using recurrent neural networks with long short-term memory cells that possess an attention mechanism to generate an annotation sentence that describes the image. Empirical evaluations of the proposed framework, acquired using a news dataset, show promising performance results and are comparable to the results of recent image annotation systems. The produced semantic annotations in free-text format can be further converted into a structured resource description framework that enables more expressive queries across a diverse source of images.
Bibliographical notePublisher Copyright:
© 2022, King Fahd University of Petroleum & Minerals.
- Feature extraction
- Image processing
- Semantic annotation
ASJC Scopus subject areas