AN IMAGE RETRIEVAL MODEL USING KNOWLEDGE GRAPH AND BAG OF VISUAL WORDS | Tài | TNU Journal of Science and Technology

AN IMAGE RETRIEVAL MODEL USING KNOWLEDGE GRAPH AND BAG OF VISUAL WORDS

About this article

Received: 17/04/25                Revised: 29/06/25                Published: 30/06/25

Authors

1. Tran Duc Tai, Ho Chi Minh City University of Education
2. Nguyen Ngoc Sang, Ho Chi Minh City University of Education
3. To Thanh Tuan, Ho Chi Minh City University of Education
4. Nguyen Do Thai Nguyen Email to author, Ho Chi Minh City University of Education

Abstract


In the context of the growing demand for image retrieval based on content and semantic understanding, traditional techniques that rely solely on visual features are increasingly revealing limitations, especially in representing semantic relationships among entities within an image. This study proposes an integrated model comprising three key components: entity detection using YOLOv8, visual feature representation through the bag of visual words model, and information organization via a knowledge graph. Detected entities are encoded into bag of visual words, from which relational triples are constructed and mapped into the knowledge graph. During querying, the system generates triples from the input image to perform semantic retrieval within the knowledge graph. The model was evaluated on two widely used image datasets OpenImagesV7 and MS-COCO, achieving accuracies of 84.1% and 89.6%, respectively. These results outperform many traditional approaches, reflecting the reliability and feasibility of the proposed model.

Keywords


Image retrieval; Bag of visual words; Knowledge graph; YOLOv8; Object detection

References


[1] I. M. Hameed, S. H. Abdulhussain, and B. M. Mahmmod, “Content-based image retrieval: A review of recent trends,” Cogent Engineering, vol. 8, no. 1, 2021, Art. no. 1927469, doi: 10.1080/23311916.2021.1927469.

[2] X. Li, J. Yang, and J. Ma, "Recent developments of content-based image retrieval (CBIR),” Neurocomputing, vol. 452, no. 10, pp. 675-689, 2021, doi: 10.1016/j.neucom.2020.07.139.

[3] A. Zareian, S. Karaman, and S. F. Chang, "Bridging Knowledge Graphs to Generate Scene Graphs," July 18, 2020. [Online]. Available: https://arxiv.org/pdf/2001.02314. [Accessed March 30, 2025].

[4] L. Giacomo, "Semantic Aware Image Search with Scene," October 17, 2022. [Online]. Available: https://thesis.unipd.it/bitstream/20.500.12608/36544/1/Loreggia_Giacomo.pdf. [Accessed March 30, 2025].

[5] X. Chang, P. Ren, P. Xu, Z. Li, X. Chen, and A. Hauptmann, "A Comprehensive Survey of Scene Graphs: Generation and Application," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 1-26, 2023, doi: 10.1109/TPAMI.2021.3137605.

[6] Z. Mohamad, H. Reza, and M. Rabiei, "A Review of Knowledge Graph Completion," Information, vol. 13, no. 8, 2022, doi: 10.3390/info13080396.

[7] W. H. Li, S. Yang, Y. Wang, D. Song, and X. Li, “Multi-level similarity learning for image-text retrieval,” Information Processing & Management, vol. 58, no. 1, 2021, doi: 10.1016/j.ipm.2020.102432.

[8] T. V. T. Le and T. T. Van, “An Image Retrieval Model combining Neighbor Graph and Semantic Graph,” (in Vietnamese), Proceedings of the 15th National Conference on Fundamental and Applied Information Technology Research (FAIR), 2022, pp. 400-412, doi: 10.15625/vap.2022.0249.

[9] M. T. Phan and T. T. Van, “An image retrieval model combining statistical methods and knowledge graphs,” (in Vietnamese), Proceedings of the 17th National Conference on Fundamental and Applied Information Technology Research (FAIR), 2024, pp. 531-539.

[10] A. Radford, J. W. Kim, C. Hallacy, A.Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning Transferable Visual Models From Natural Language Supervision,” Proceedings of the 38th International Conference on Machine Learning, 2021, pp. 8748-8763.

[11] X. Li, X. Yin, C. Li, P. Zhang, X. Hu, L. Zhang, L. Wang, H. Hu, L. Dong, F. Wei, Y. Choi, and J. Gao, “Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks,” Computer Vision – ECCV 2020: 16th European Conference, 2020, pp. 121-137, doi: 10.1007/978-3-030-58577-8_8.




DOI: https://doi.org/10.34238/tnu-jst.12608

Refbacks

  • There are currently no refbacks.
TNU Journal of Science and Technology
Rooms 408, 409 - Administration Building - Thai Nguyen University
Tan Thinh Ward - Thai Nguyen City
Phone: (+84) 208 3840 288 - E-mail: jst@tnu.edu.vn
Based on Open Journal Systems
©2018 All Rights Reserved