THIẾT KẾ XÂY DỰNG PHẦN MỀM PHIÊN DỊCH NGÔN NGỮ KÝ HIỆU TIẾNG VIỆT

Trần Vũ Hoàng; Lê Quốc Đạt; Huỳnh Đình Hiệp; Đoàn Mạnh Cường

doi:10.34238/tnu-jst.12232

THIẾT KẾ XÂY DỰNG PHẦN MỀM PHIÊN DỊCH NGÔN NGỮ KÝ HIỆU TIẾNG VIỆT

Thông tin bài báo

Ngày nhận bài: 06/03/25 Ngày hoàn thiện: 16/06/25 Ngày đăng: 27/06/25

Các tác giả

1. Trần Vũ Hoàng , Trường Đại học Sư phạm Kỹ thuật Thành phố Hồ Chí Minh
2. Lê Quốc Đạt, Trường Đại học Sư phạm Kỹ thuật Thành phố Hồ Chí Minh
3. Huỳnh Đình Hiệp, Công ty Cổ phần Phần mềm Viễn thông miền Nam
4. Đoàn Mạnh Cường, Trường Đại học Công nghệ Thông tin và Truyền thông - ĐH Thái Nguyên

Tóm tắt

Trong thời đại công nghệ phát triển nhanh chóng hiện nay, các ứng dụng sử dụng trí tuệ nhân tạo nói chung trên thế giới đang góp phần không nhỏ đến sự phát triển kinh tế - xã hội. Đi cùng với sự phát triển nhanh chóng của xã hội là lượng thông tin thay đổi hàng ngày, hàng giờ thế nên đối với người tiếp nhận thông tin bị hạn chế, gặp phải rào cản ngôn ngữ hay người khiếm khuyết thì việc cập nhật những thông tin mới là một điều tương đối khó khăn. Trong nghiên cứu này, chúng tôi đề xuất phương pháp thiết kế xây dựng phần mềm phiên dịch dành cho người khiếm thính, kết hợp ngôn ngữ ký hiệu dựa vào xử lý ngôn ngữ tự nhiên, mô hình học sâu và thị giác máy tính. Mục tiêu là thiết kế hệ thống có chức năng chuyển đổi được thông tin dưới dạng văn bản hoặc âm thanh thành các video ngắn biểu diễn bằng ngôn ngữ ký hiệu. Sau khi trải qua thực nghiệm, hệ thống đáp ứng tất cả các yêu cầu đã đề ra. Hệ thống có thể chuyển đổi một văn bản hoặc tệp âm thanh thành một video giúp người khiếm thính hiểu được và thời gian kết xuất video đạt tốc độ khoảng 20s/ từ (cụm từ).

Từ khóa

Phiên dịch ngôn ngữ ký hiệu tiếng Việt; AlphaPose; SMPL; PhoWhisper; Blender Python API

Toàn văn:

PDF

Tài liệu tham khảo

[1] General Statistics Office of Viet Nam, “Vietnam National survey on people with disabilities 2016,” 2019. [Online]. Available: https://www.gso.gov.vn/en/data-and-statistics/2019/03/vietnam-national-survey-on-people-with-disabilities-2016. [Accessed Aug. 16, 2024].

[2] S. Savla, “Real-time Continuous Transcription with Live Transcribe,” Google Research, 2019. [Online]. Available: https://research.google/blog/real-time-continuous-transcription-with-live-transcribe/. [Accessed Aug. 16, 2024]

[3] HandTalk, “Discover the largest Sign Language translation platform in the world,” HandTalk, 2024. [Online]. Available: https://www.handtalk.me/en. [Accessed Aug. 16, 2024].

[4] A. Le, “Going hand in glove with sign language,” Viet Nam News, April 09, 2017. [Online]. Available: https://vietnamnews.vn/sunday/features/374025/going-hand-in-glove-with-sign-language.html. [Accessed Aug. 16, 2024].

[5] L. D. Quach, H. D. K. Nguyen, and C. N. Nguyen, “Converting the Vietnamese Television News into 3D Sign Language Animations for the Deaf,” 4th EAI International Conference on Industrial Networks and Intelligent Systems, Da Nang, Vietnam, 2018, pp. 155-163.

[6] R. K. Pathan, M. Biswas, S. Yasmin, et al., “Sign language recognition using the fusion of image and hand landmarks through multi-headed convolutional neural network,” Sci. Rep., vol.13, 2023, Art. no. 16975.

[7] J. Zhang, X. Bu, Y. Wang, H. Dong, Y. Zhang, and H. Wu, "Sign language recognition based on dual-path background erasure convolutional neural network," Sci. Rep., vol. 14, 2024, Art. no. 11360.

[8] N. F. Attia, M. T. F. S. Ahmed, and M. A. M. Alshewimy, “Efficient deep learning models based on tension techniques for sign language recognition,” Intelligent Systems with Applications, vol. 20, 2023, Art. no. 200284.

[9] Y. Liu, P. Nand, M. A. Hossain, M. Nguyen, and W. Q. Yan, "Sign language recognition from digital videos using feature pyramid network with detection transformer," Multimedia Tools and Applications, vol. 82, pp. 21673–21685, 2023.

[10] Y. Li, N. Miao, L. Ma, F. Shuang, and X. Huang, " Transformer for object detection: Review and benchmark," Engineering Applications of Artificial Intelligence, vol. 126, 2023, Art. no. 107021.

[11] T. B. Nguyen, "Vietnamese end-to-end speech recognition using wav2vec 2.0," 2021. [Online]. Available: https://github.com/vietai/ASR. [Accessed Mar. 3, 2025].

[12] T.-T. Le, T. L. Nguyen, and Q. D. Nguyen, "PhoWhisper: Automatic Speech Recognition for Vietnamese," Proceedings of the ICLR 2024 Tiny Papers track, 2024, pp. 1-3.

[13] H. Fang et al., "AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time," in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, pp. 7157-7173.

[14] QIPEDC Project, “Vietnam Quality Improvement of Primary Education for Deaf Children Project,” 2022. [Online]. Available: https://qipedc.moet.gov.vn/. [Accessed Aug. 16, 2024].

[15] M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “SMPL: A Skinned Multi-Person Linear Model,” ACM Transactions on Graphics, vol. 34, pp. 1-16, 2015.

[16] T. M. H. Nguyen, T. T. L. Hoang, and X. L. Vu, "Guidelines for Word Unit Recognition in Vietnamese Text," (in Vietnamese), 2009. [Online]. Available: https://www.jaist.ac.jp/~bao/VLSP-text/Mar2009/SP82_%20Baocaokythuat_2009thang3.pdf. [Accessed Aug. 16, 2024].

[17] L. Phan, H. Tran, H. Nguyen, and T. H. Trinh, "ViT5: Pretrained Text-to-Text Transformer for Vietnamese Language Generation," in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, 2022, pp. 136–142.

[18] T. B. D. Nguyen, "Parallel-Corpus-Vie-VSL," 2024. [Online]. Available: https://github.com/ BichDiep/Parallel-Corpus-Vie-VSL. [Accessed Mar. 24, 2025]

[19] T. T. L. Tran, H.-G. Kim, M. H. La, and V. S. Pham, "Automatic Speech Recognition of Vietnamese for a New Large-Scale Corpus," Electronics, vol. 13, no. 5, Art. no.977, 2024.

[20] Q. D. Le, "Datasets," 2025. [Online]. Available: https://drive.google.com/drive/folders/ 1XO69X3Kjlr6m_Y37EmIJEGUz_GQ_Lybu?usp=drive_link. [Accessed Mar. 3, 2025].

[21] A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust Speech Recognition via Large-Scale Weak Supervision,” in Proceedings of the 40th International Conference on Machine Learning, vol. 202, pp. 28492-28518, 2023.

[22] MMPose Contributors, "MMPOSE: OpenMMLab Pose Estimation Toolbox and Benchmark," 2024. [Online]. Available: https://github.com/open-mmlab/mmpose. [Accessed Mar. 3, 2025].

[23] Z. Yang, A. Zeng, C. Yuan, and Y. Li, "Effective Whole-body Pose Estimation with Two-stages Distillation," IEEE/CVF International Conference on Computer Vision, 2023, pp. 4210-4220.

DOI: https://doi.org/10.34238/tnu-jst.12232

Các bài báo tham chiếu

Hiện tại không có bài báo tham chiếu



Ghi nhớ