ADVANCING EMOTION RECOGNITION IN VIETNAMESE: A PHOBERT-BASED APPROACH FOR ENHANCED INTERACTION | Trâm | TNU Journal of Science and Technology

ADVANCING EMOTION RECOGNITION IN VIETNAMESE: A PHOBERT-BASED APPROACH FOR ENHANCED INTERACTION

About this article

Received: 26/05/25                Revised: 29/06/25                Published: 29/06/25

Authors

1. Huynh Thi Ngoc Tram, International University - Vietnam National University Ho Chi Minh City; Vietnam National University Ho Chi Minh City
2. Pham Minh Dzuy Email to author, Trường Đại học Công nghệ Thông tin - Đại học Quốc gia Thành phố Hồ Chí Minh ; Đại học Quốc gia Thành phố Hồ Chí Minh
3. Pham Duc Dat, International University - Vietnam National University Ho Chi Minh City; Vietnam National University Ho Chi Minh City
4. Le Duy Tan, International University - Vietnam National University Ho Chi Minh City; Vietnam National University Ho Chi Minh City
5. Huynh Kha Tu, International University - Vietnam National University Ho Chi Minh City; Vietnam National University Ho Chi Minh City

Abstract


Emotion recognition using artificial intelligence is essential for improving human-machine interactions in healthcare, education, and smart homes. Addressing Vietnamese - specific challenges such as tonality and context-dependent meanings, we developed a high-quality dataset from social media, product reviews, and conversational dialogs. Rigorous preprocessing (cleaning, normalization, tokenization) and oversampling addressed class imbalance, enhancing data reliability. PhoBERT-base-v2, a Vietnamese-optimized Transformer, achieved state-of-the-art accuracy (94.22%) and macro metrics (> 94%), significantly outperforming traditional machine-learning and other deep-learning methods. Analysis revealed strong differentiation of nuanced emotions, though confusion persisted between semantically similar feelings (e.g., Anger vs. Disgust). We demonstrated practical deployment via a Gradio interface for real-time sentiment analysis, illustrating potential applications like social media monitoring, customer feedback analysis, and mental health support. Future work includes multimodal approaches combining text and speech for enhanced accuracy.

Keywords


Emotion recognition; Vietnamese natural language processing; Deep learning models; Sentiment analysis; Artificial intelligence

Full Text:

PDF

References


[1] M. Dhuheir, A. Albaseer, E. Baccour, A. Erbad, M. Abdallah, and M. Hamdi, “Emotion recognition for healthcare surveillance systems using neural networks: A survey,” in Proceedings of the 2021 International Wireless Communications and Mobile Computing Conference (IWCMC), Harbin City, China, 2021, pp. 681–687, doi: 10.1109/IWCMC51323.2021.9498861.

[2] X. T. Le, T. T. Dao, V. L. Trinh, and H. Q. Nguyen, “Speech Emotions and Statistical Analysis for Vietnamese Emotion Corpus,” Journal on Information Technologies & Communications, vol. V-1, no. 35, pp. 86-98, 2022, doi: 10.32913/mic-ict-research-vn.v1.n35.233.

[3] V. A. Ho, D. H.-C. Nguyen, D. H. Nguyen, L. T.-V. Pham, D.-V. Nguyen, K. V. Nguyen, and N. L.-T. Nguyen, “Emotion Recognition for Vietnamese Social Media Text,” CoRR, 2019, doi: 10.48550/arXiv.1911.09339.

[4] D. Q. Nguyen and A. T. Nguyen, “PhoBERT: Pre-trained language models for Vietnamese,” in Findings of the Association for Computational Linguistics: EMNLP 2020, Online: Association for Computational Linguistics, Nov. 2020, pp. 1037–1042, doi: 10.18653/v1/2020.findings-emnlp.92.

[5] A. F. A. Nasir, E. Nee, C. S. Choong, A. S. A. Ghani, A. P. P. A. Majeed, A. Adam, and M. Furqan, “Text-based emotion prediction system using machine learning approach,” in IOP Conference Series: Materials Science and Engineering, vol. 769, Jun. 2020, Art. no. 012022.

[6] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks,” in Proceedings of the 30th International Conference on Machine Learning (ICML), vol. 28, no. 3, pp. 1310–1318, 2013.

[7] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” in IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157-166, March 1994, doi: 10.1109/72.279181.

[8] S.-H. Noh, “Analysis of Gradient Vanishing of RNNs and Performance Comparison,” Information, vol. 12, vol. 12, no. 11, 2021, Art. no. 442, doi: 10.3390/info12110442.

[9] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Minneapolis, MN, USA, Jun. 2019, pp. 4171–4186, doi: 10.18653/v1/N19-1423.

[10] M. T. Ngo, B. H. Ngo, and V. V. Stuchilin, “Fine-tuned PhoBERT for sentiment analysis of Vietnamese phone reviews,” CTU Journal of Innovation & Sustainable Development, vol. 16, no. Special issue: ISDS, pp. 52-57, 2024.

[11] H. T. T. Thieu, “Challenges in Classification of Vietnamese Sentiment,” International Journal of Scientific and Technical Research in Engineering (IJSTRE), vol. 6, no. 5, pp. 1–6, 2021.

[12] N. D. Q. Anh, M.-H. Ha, Q. C. Nguyen, T. H. T. Nguyen, Q. Vu, D. X. Minh-Duc, D.-C. Nguyen, and T. K. Dinh, "VNEMOS: Vietnamese Speech Emotion Inference Using Deep Neural Networks," in 2024 9th International Conference on Integrated Circuits, Design, and Verification (ICDV), Hanoi, Vietnam, 2024, pp. 97-101, doi: 10.1109/ICDV61346.2024.10616411.

[13] undertheseanlp, “undertheseanlp/underthesea: Underthesea - Vietnamese NLP Toolkit,” 2017, [Online]. Available: https://github.com/undertheseanlp/underthesea. [Accessed 11 May 2025].

[14] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.

[15] S. Robertson, “Understanding inverse document frequency: on theoretical arguments for IDF,” Journal of Documentation, vol. 60, no. 5, pp. 503-520, 2004.

[16] N. S. M. Nafis and S. Awang, “An Enhanced Hybrid Feature Selection Technique Using Term Frequency-Inverse Document Frequency and Support Vector Machine-Recursive Feature Elimination for Sentiment Classification,” IEEE Access, vol. 9, pp. 52177-52192, 2021.

[17] E. Gkintoni, A. Aroutzidis, H. Antonopoulou, and C. Halkiopoulos, “From Neural Networks to Emotional Networks: A Systematic Review of EEG-Based Emotion Recognition in Cognitive Neuroscience and Real-World Applications,” Brain Sciences, vol. 15, no. 3, 2025, Art. no. 220.

[18] Z. Hameed and B. Garcia-Zapirain, “Sentiment Classification Using a Single-Layered BiLSTM Model,” IEEE Access, vol. 8, pp. 73992-74001, 2020.

[19] M. Samaneh, P. David, A. Olayinka, P. Christian, M. Farhaan, M. Shilpa, and S. Sandra, “Automatic Speech Emotion Recognition Using Machine Learning: Digital Transformation of Mental Health,” in PACIS 2022 Proceedings, Chiang Mai, Thailand, 2022, Art. no. 45.

[20] M. Awatef, B. Hayet, and L. Zied, “Multimodal emotion recognition: Integrating speech and text for improved valence, arousal, and dominance prediction,” Annals of Telecommunications., vol. 80, no. 5, pp. 401-415, 2025.




DOI: https://doi.org/10.34238/tnu-jst.12889

Refbacks

  • There are currently no refbacks.
TNU Journal of Science and Technology
Rooms 408, 409 - Administration Building - Thai Nguyen University
Tan Thinh Ward - Thai Nguyen City
Phone: (+84) 208 3840 288 - E-mail: jst@tnu.edu.vn
Based on Open Journal Systems
©2018 All Rights Reserved