DESIGN AND BUILD A VIETNAMESE SIGN LANGUAGE TRANSLATION APPLICATION

Trần Vũ Hoàng; Lê Quốc Đạt; Huỳnh Đình Hiệp; Đoàn Mạnh Cường

doi:10.34238/tnu-jst.12232

DESIGN AND BUILD A VIETNAMESE SIGN LANGUAGE TRANSLATION APPLICATION

About this article

Received: 06/03/25 Revised: 16/06/25 Published: 27/06/25

Authors

1. Tran Vu Hoang , Ho Chi Minh City University of Technology and Education
2. Le Quoc Dat, Ho Chi Minh City University of Technology and Education
3. Huynh Dinh Hiep, South Telecommunication & Software JSC
4. Doan Manh Cuong, TNU - University of Information and Communication Technology

Abstract

In the rapidly developing technological era today, artificial intelligence applications worldwide are significantly contributing to economic and social development. Accompanying the swift advancement of society is the ever-changing influx of information, which poses a considerable challenge for those with limited access to information, language barriers, or disabilities in keeping up with new information. In this study, we propose a method to design and develop a translation software for the hearing-impaired, incorporating sign language based on natural language processing, deep learning models, and computer vision. The goal is to design a system that can convert information in the form of text or audio into short videos represented in sign language. After undergoing experimentation, the system has met all the specified requirements. The system can convert a text or audio file into a video that can be understood by the hearing-impaired, with a rendering time of approximately 20 seconds per word (phrase).

Keywords

Vietnamese sign language translation; AlphaPose; SMPL; PhoWhisper; Blender Python API

Full Text:

PDF (Tiếng Việt)

References

[1] General Statistics Office of Viet Nam, “Vietnam National survey on people with disabilities 2016,” 2019. [Online]. Available: https://www.gso.gov.vn/en/data-and-statistics/2019/03/vietnam-national-survey-on-people-with-disabilities-2016. [Accessed Aug. 16, 2024].

[2] S. Savla, “Real-time Continuous Transcription with Live Transcribe,” Google Research, 2019. [Online]. Available: https://research.google/blog/real-time-continuous-transcription-with-live-transcribe/. [Accessed Aug. 16, 2024]

[3] HandTalk, “Discover the largest Sign Language translation platform in the world,” HandTalk, 2024. [Online]. Available: https://www.handtalk.me/en. [Accessed Aug. 16, 2024].

[4] A. Le, “Going hand in glove with sign language,” Viet Nam News, April 09, 2017. [Online]. Available: https://vietnamnews.vn/sunday/features/374025/going-hand-in-glove-with-sign-language.html. [Accessed Aug. 16, 2024].

[5] L. D. Quach, H. D. K. Nguyen, and C. N. Nguyen, “Converting the Vietnamese Television News into 3D Sign Language Animations for the Deaf,” 4th EAI International Conference on Industrial Networks and Intelligent Systems, Da Nang, Vietnam, 2018, pp. 155-163.

[6] R. K. Pathan, M. Biswas, S. Yasmin, et al., “Sign language recognition using the fusion of image and hand landmarks through multi-headed convolutional neural network,” Sci. Rep., vol.13, 2023, Art. no. 16975.

[7] J. Zhang, X. Bu, Y. Wang, H. Dong, Y. Zhang, and H. Wu, "Sign language recognition based on dual-path background erasure convolutional neural network," Sci. Rep., vol. 14, 2024, Art. no. 11360.

[8] N. F. Attia, M. T. F. S. Ahmed, and M. A. M. Alshewimy, “Efficient deep learning models based on tension techniques for sign language recognition,” Intelligent Systems with Applications, vol. 20, 2023, Art. no. 200284.

[9] Y. Liu, P. Nand, M. A. Hossain, M. Nguyen, and W. Q. Yan, "Sign language recognition from digital videos using feature pyramid network with detection transformer," Multimedia Tools and Applications, vol. 82, pp. 21673–21685, 2023.

[10] Y. Li, N. Miao, L. Ma, F. Shuang, and X. Huang, " Transformer for object detection: Review and benchmark," Engineering Applications of Artificial Intelligence, vol. 126, 2023, Art. no. 107021.

[11] T. B. Nguyen, "Vietnamese end-to-end speech recognition using wav2vec 2.0," 2021. [Online]. Available: https://github.com/vietai/ASR. [Accessed Mar. 3, 2025].

[12] T.-T. Le, T. L. Nguyen, and Q. D. Nguyen, "PhoWhisper: Automatic Speech Recognition for Vietnamese," Proceedings of the ICLR 2024 Tiny Papers track, 2024, pp. 1-3.

[13] H. Fang et al., "AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time," in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, pp. 7157-7173.

[14] QIPEDC Project, “Vietnam Quality Improvement of Primary Education for Deaf Children Project,” 2022. [Online]. Available: https://qipedc.moet.gov.vn/. [Accessed Aug. 16, 2024].

[15] M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “SMPL: A Skinned Multi-Person Linear Model,” ACM Transactions on Graphics, vol. 34, pp. 1-16, 2015.

[16] T. M. H. Nguyen, T. T. L. Hoang, and X. L. Vu, "Guidelines for Word Unit Recognition in Vietnamese Text," (in Vietnamese), 2009. [Online]. Available: https://www.jaist.ac.jp/~bao/VLSP-text/Mar2009/SP82_%20Baocaokythuat_2009thang3.pdf. [Accessed Aug. 16, 2024].

[17] L. Phan, H. Tran, H. Nguyen, and T. H. Trinh, "ViT5: Pretrained Text-to-Text Transformer for Vietnamese Language Generation," in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, 2022, pp. 136–142.

[18] T. B. D. Nguyen, "Parallel-Corpus-Vie-VSL," 2024. [Online]. Available: https://github.com/ BichDiep/Parallel-Corpus-Vie-VSL. [Accessed Mar. 24, 2025]

[19] T. T. L. Tran, H.-G. Kim, M. H. La, and V. S. Pham, "Automatic Speech Recognition of Vietnamese for a New Large-Scale Corpus," Electronics, vol. 13, no. 5, Art. no.977, 2024.

[20] Q. D. Le, "Datasets," 2025. [Online]. Available: https://drive.google.com/drive/folders/ 1XO69X3Kjlr6m_Y37EmIJEGUz_GQ_Lybu?usp=drive_link. [Accessed Mar. 3, 2025].

[21] A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust Speech Recognition via Large-Scale Weak Supervision,” in Proceedings of the 40th International Conference on Machine Learning, vol. 202, pp. 28492-28518, 2023.

[22] MMPose Contributors, "MMPOSE: OpenMMLab Pose Estimation Toolbox and Benchmark," 2024. [Online]. Available: https://github.com/open-mmlab/mmpose. [Accessed Mar. 3, 2025].

[23] Z. Yang, A. Zeng, C. Yuan, and Y. Li, "Effective Whole-body Pose Estimation with Two-stages Distillation," IEEE/CVF International Conference on Computer Vision, 2023, pp. 4210-4220.

DOI: https://doi.org/10.34238/tnu-jst.12232

Refbacks

There are currently no refbacks.



Remember me