NGHIÊN CỨU CÁC MÔ HÌNH CHUYỂN ĐỔI HÌNH ẢNH THÀNH VIDEO: MỘT ĐÁNH GIÁ TOÀN DIỆN
Thông tin bài báo
Ngày nhận bài: 19/10/25                Ngày hoàn thiện: 30/12/25                Ngày đăng: 31/12/25Tóm tắt
Từ khóa
Toàn văn:
PDFTài liệu tham khảo
[1] C. Liu and H. Yu, “Ai-empowered persuasive video generation: A survey,” ACM Comput. Surv., vol. 55, no. 13, pp. 1–31, 2023.
[2] P. Eigenschink, T. Reutterer, S. Vamosi, R. Vamosi, C. Sun, and K. Kalcher, “Deep generative models for synthetic data: A survey,” IEEE Access, vol. 11, pp. 47304–47320, 2023.
[3] D.-Q. Vu, T. P. T. Thu, N. Le, and J.-C. Wang, “Deep learning for human action recognition: a comprehensive review,” APSIPA Trans. Signal Inf. Process., vol. 12, no. 1, pp. 1–47, 2023.
[4] D.-Q. Vu and T. P. T. Thu, “Simultaneous context and motion learning in video prediction,” Signal Image Video Process., vol. 17, no. 8, pp. 3933–3942, 2023.
[5] P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Adv. Neural Inf. Process. Syst., vol. 34, pp. 8780–8794, 2021.
[6] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” presented at the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10684–10695.
[7] J. Ho et al., “Imagen video: High definition video generation with diffusion models,” ArXiv Prepr. ArXiv221002303, 2022.
[8] A. Melnik, M. Ljubljanac, C. Lu, Q. Yan, W. Ren, and H. Ritter, “Video Diffusion Models: A Survey,” Trans. Mach. Learn. Res., 2024, [Online]. Available: https://openreview.net/forum? id=rJSHjhEYJx. [Accessed Oct. 03, 2025].
[9] V. Micheli, E. Alonso, and F. Fleuret, “Transformers are Sample-Efficient World Models,” in The Eleventh International Conference on Learning Representations, 2023, pp. 1–21. [Online]. Available: https://openreview.net/forum?id=vhFu1Acb0xb. [Accessed Oct. 03, 2025].
[10] W. Hong, M. Ding, W. Zheng, X. Liu, and J. Tang, “CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers,” in The Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=rB6TpjAuSRy. [Accessed Oct. 03, 2025].
[11] U. Singer et al., “Make-a-video: Text-to-video generation without text-video data,” ArXiv Prepr. ArXiv220914792, 2022.
[12] J. Z. Wu et al., “Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 7623–7633.
[13] R. Villegas et al., “Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions,” in International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=vOEXS39nOF. [Accessed Oct. 03, 2025].
[14] S. Yu, K. Sohn, S. Kim, and J. Shin, “Video probabilistic diffusion models in projected latent space,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 18456–18466.
[15] A. Blattmann et al., “Align your latents: High-resolution video synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 22563–22575.
[16] Y. Guo et al., “AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning,” in The Twelfth International Conference on Learning Representations, 2024, pp. 1–14. [Online]. Available: https://openreview.net/forum?id=Fx2SbBgcte. [Accessed Oct. 03, 2025].
[17] Y. HaCohen et al., “LTX-Video: Realtime Video Latent Diffusion,” CoRR, vol. abs/2501.00103, Jan. 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2501.00103. [Accessed Oct. 03, 2025].
DOI: https://doi.org/10.34238/tnu-jst.13790
Các bài báo tham chiếu
- Hiện tại không có bài báo tham chiếu





