RECOGNIZING VIETNAMESE SIGN LANGUAGE USING DEEP NEURAL NETWORKS

Nguyễn Quang Duy; Lương Thái Lê

doi:10.34238/tnu-jst.12708

RECOGNIZING VIETNAMESE SIGN LANGUAGE USING DEEP NEURAL NETWORKS

About this article

Received: 29/04/25 Revised: 26/06/25 Published: 28/06/25

Authors

1. Nguyen Quang Duy, University of Transport and Communications
2. Luong Thai Le , University of Transport and Communications

Abstract

Vietnamese sign language plays a pivotal role in enabling effective communication among deaf and hard-of-hearing communities throughout Vietnam. In this study, we propose a deep learning-based recognition system that leverages MediaPipe to accurately extract hand landmarks from video sequences. These landmarks are then processed by an architecture, either a convolutional neural network or a long short-term memory network enhanced with an attention mechanism (such as additive or multi-head attention), to selectively highlight salient temporal patterns in sign gestures. To support robust training and evaluation, we compiled and meticulously annotated a comprehensive dataset of Vietnamese sign language gestures. Experimental results demonstrate that the proposed model attains a remarkable recognition accuracy of 99.51%, outperforming baseline approaches. The system’s real-time performance and high precision highlight its potential as the basis for practical assistive communication tools, paving the way for further research in sign language processing and cross-cultural gesture recognition applications within the Vietnamese context.

Keywords

Vietnamese sign language; Convolutional neural network; Long short-term memory; Attention mechanism; Computer vision

Full Text:

PDF

References

[1] V. Bazarevsky et al., “MediaPipe: A Framework for Building Perception Pipelines,” arXiv preprint arXiv:1906.08172, 2019.

[2] R. Kumar, A. Bajpai, and A. Sinha, “Mediapipe and CNNs for Real-Time ASL Gesture Recognition,” arXiv preprint arXiv:2305.05296, 2023.

[3] H. P. The, H. C. Chau, V.-P. Bui, and K. Ha, “Automatic feature extraction for Vietnamese sign language recognition using support vector machine,” 2018 2nd International Conference on Recent Advances in Signal Processing, Telecommunications & Computing (SigTelCom), Jan. 2018, pp. 146–151, doi: 10.1109/SIGTELCOM.2018.8325780.

[4] S. Yan et al., “Spatial–Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition,” in Proc. 32nd AAAI Conf. on Artificial Intelligence, 2018, pp. 7444–7452.

[5] L. Shi et al., “Two-Stream Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2019, pp. 5678–5686.

[6] C. C. De Amorim et al., “Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition,” in Proc. International Joint Conference on Neural Networks (IJCNN), 2019, pp. 1–8.

[7] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.

[8] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681, Nov. 1997.

[9] D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate,” arXiv preprint arXiv:1409.0473, 2014.

[10] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention Is All You Need,” in NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 6000 - 6010.

DOI: https://doi.org/10.34238/tnu-jst.12708

Refbacks

There are currently no refbacks.



Remember me