MỘT MÔ HÌNH MÔ TẢ HÌNH ẢNH KẾT HỢP ĐỒ THỊ TRI THỨC VÀ MẠNG HỌC SÂU
Thông tin bài báo
Ngày nhận bài: 17/04/25                Ngày hoàn thiện: 16/06/25                Ngày đăng: 27/06/25Tóm tắt
Từ khóa
Toàn văn:
PDFTài liệu tham khảo
[1] M. Chohan, A. Khan, M. S. Mahar, S. Hassan, A. Ghafoor, and M. Khan, “Image Captioning using Deep Learning: A Systematic Literature Review,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 11, no. 5, 2020, doi: 10.14569/IJACSA.2020.0110537.
[2] S. He, W. Liao, H. R. Tavakoli, M. Yang, B. Rosenhahn, and N. Pugeault, "Image captioning through image transformer," Proceedings of the Asian conference on computer vision, 2020, doi: 10.48550/arXiv.2004.14231.
[3] X. Yang, H. Zhang, and J. Cai, “Autoencoding and distilling scene graphs for image captioning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2313-2327, 2020, doi: 10.1109/TPAMI.2020.3042192
[4] R. Li, S. Zhang, D. Lin, K. Chen, and X. He, "From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 28076-28086, doi: 10.1109/CVPR52733.2024.02652.
[5] W. Zhao and X. Wu, "Boosting entity-aware image captioning with multi-modal knowledge graph," IEEE Transactions on Multimedia, vol. 26, pp. 2659 – 2670, 2023, doi: 10.1109/TMM.2023.3301279.
[6] S. S. Santiesteban, S. Atito, M. Awais, Y. S. Song, and J. Kittler, "Improved Image Captioning Via Knowledge Graph-Augmented Models," ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, doi: 10.1109/ICASSP48485.2024.10447637.
[7] A. Osman, M. A. W. Shalaby, M. M. Soliman, and K. M. Elsayed, "A survey on attention-based models for image captioning," International Journal of Advanced Computer Science and Application, vol. 14, no. 2, 2023, doi: 10.14569/IJACSA.2023.0140249.
[8] A. C. Pham, V. Q. Nguyen, T. H. Vuong, and Q. T. Ha, "KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain," arXiv preprint arXiv:2401.08100, 2024, doi: 10.48550/arXiv.2401.08100.
[9] T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision, 2014, pp. 740–755.
[10] R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L. J. Li, D. A. Shamma, M. S. Bernstein, and F. F. Li, "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations," Int. J. Comput. Vision, vol. 123, pp. 32-73, 2017.
[11] Y. Cong, M. Y. Yang, and B. Rosenhahn, "RelTR: Relation Transformer for Scene Graph Generation," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 11169-11183, 2023, doi: 10.1109/TPAMI.2023.3268066.
[12] A. Osman, M. A. W. Shalaby, and M. M. Soliman, “Novel concept-based image captioning models using LSTM and multi-encoder transformer architecture,” Sci. Rep., vol. 14, no. 1, 2024, doi: 10.1038/s41598-024-69664-1.
[13] F. Zhao, Z. Yu, T. Wang, and L. Yi, "Image Captioning Based on Semantic Scenes," Entropy, vol. 26, no. 10, 2024, Art. no. 876, doi: 10.3390/e26100876.
DOI: https://doi.org/10.34238/tnu-jst.12614
Các bài báo tham chiếu
- Hiện tại không có bài báo tham chiếu