A RESEARCH ON VIETNAMESE-K’HO LANGUAGE TRANSLATION SYSTEM USING NEURAL MACHINE TRANSLATION | Lương | TNU Journal of Science and Technology

A RESEARCH ON VIETNAMESE-K’HO LANGUAGE TRANSLATION SYSTEM USING NEURAL MACHINE TRANSLATION

About this article

Received: 29/10/22                Revised: 31/03/23                Published: 07/04/23

Authors

1. Nguyen Thi Luong Email to author, Dalat University
2. La Quoc Thang, Dalat University
3. Tran Nhat Quang, Dalat University
4. Duong Bao Ninh, Dalat University
5. Nguyen Huu Khanh, Dalat University
6. Phan Thi Thanh Nga, Dalat University
7. Tran Ngo Nhu Khanh, Dalat University
8. Tran Thong, Dalat University

Abstract


The K'Ho language is used by the K'Ho ethnic group, who live in the South Central Highlands, especially the districts of Don Duong, Duc Trong, Di Linh, Da Huoai, and Lac Duong in Lam Dong province. Currently, the provincial People's Committee and the Ethnic Minority Committee of Lam Dong province are encouraging cadres and officials in the province to learn the K'Ho language to contact and propagate the guidelines, lines, policies, and laws of the Party and government to the K'Ho people. In this paper, we utilize the K'Ho language resources and support from many K'Ho language experts to build a Vietnamese - K'Ho bilingual corpus to contribute the promotion and preservation of the K'Ho language. The corpus includes more than 16,000 Vietnamese-K'Ho bilingual sentence pairs, which are not easy to collect due to the limitation of K'Ho language resource. Moreover, we use the OpenNMT framework to build an automatic translation system based on the collected bilingual data. The result can reach to an accuracy of 56.54%, which is an acceptable result in the automatic translation field.

Keywords


K'Ho language; Bilingual Corpus; Automatic translation; RNN; OpenNMT

Full Text:

PDF

References


REFERENCES

[1] D. C. Le and T. T. T. Nguyen, "Vietnamese-English Translation with Transformer and Back Translation in VLSP 2020 Machine Translation Shared Task," in Proceedings of the 7th International Workshop on Vietnamese Language and Speech Processing, Hanoi, Vietnam, Association for Computational Linguistics, 2020, pp. 64–70.

[2] H. H. P. Vu, V. T. Tran, V. N. Nguyen, H. V. Dang, and P. T. Do, "Machine Translation between Vietnamese and English: an Empirical Study," Journal of Computer Science and Cybernetics, vol. 35, no. 2, pp. 147-166, 2019.

[3] N. Q. Phuoc, Y. Quan, and C.-Y. Ock, "Building a bidirectional english-vietnamese statistical machine translation system by using moses," International Journal of Computer and Electrical Engineering, vol. 8, no. 2, pp. 161-168, 2016.

[4] P. Huang, C. Wang, D. Zhou, and L. Deng, "Neural phrase-based machine translation," in CoRR, 2017.

[5] M. Cettolo, J. Niehues, S. Stuker, L. Bentivogli, R. Cattoni, and M. Federico, "The iwslt 2015 evaluation campaign," in Proceeding of the 12th International Workshop on Spoken Language Translation, 2015, pp 2-14,.

[6] H. P. Vu, V. Nguyen, V. Tran, and P. Do, "Towards state-of-the-art english-vietnamese neural machine translation," in Proceedings of the Eighth International Symposium on Information and Communication Technology, Nha Trang, 2017, pp 120-126,.

[7] H. Nguyen, L. Nguyen, P. Le, H. Nguyen, and T. Dinh, "Vietnamese-K'Ho automatic translation using statistical-based methods," Science Journal of Da Lat University, vol. 8, no. 3, pp.135-148, 2018.

[8] T. Nguyen and T. Dinh, "Vietnamese-K'Ho automatic translation using an example-based method," Science Journal of Da Lat University, vol. 6, pp.160 - 173, 2016.

[9] M-T. Luong, H. Pham, and C. D. Manning, “Effective Approaches to Attention-based Neural Machine Translation,” arXiv prePrint arXiv:1508.04025, 2015.

[10] J. Gehring, M. Auli, D. Grangier, D. Yarats, Y. N. Dauphin, “Convolutional Sequence to Sequence Learning,” arXiv prePrint arXiv:1705.03122, 2017.

[11] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al., “Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation,” arXiv prePrint arXiv:1609.08144, 2016.

[12] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language Models are Unsupervised Multitask Learners,” Technical report, OpenAi, 2019.

[13] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language Understanding by Generative Pre-Training,” Computer Science, 2018. [Online]. Available: https://www.semanticscholar. org/paper/Improving-Language-Understanding-by-Generative-Radford-Narasimhan/cd18800a0fe0b66 8 a1cc19f2ec95b5003d0a5035. [Accessed October 05, 2022].

[14] Y. G. Nie, V. N. Hiep, and T. C. Lam, “Research and perfect the program to support handwriting processing of some ethnic minorities in the Central Highlands by TayNguyenKey software,” Science and technology research topics, DakLak, 2010.

[15] F. J. Och and H. Ney, "Improved Statistical Alignment Models," in Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, 2000, pp. 440-447,.

[16] R. Östling and J. Tiedemann, "Efficient Word Alignment with Markov Chain Monte Carlo," The Prague Bulletin of Mathematical Linguistics, vol. 106, pp. 125-146, 2016.

[17] S. Ker and J. Chang, "A Class-based Approach to Word Alignment," Computational Linguistics, vol. 23, pp. 313-343, 2002.

[18] Z.-Y. Dou and G. Neubig, "Word Alignment by Fine-tuning Embeddings on Parallel Corpora," CoRR, vol. abs/2101.08231, pp. 2112-2128, 2021.

[19] G. Klein, Y. Kim, Y. Deng, J. Senellart, and A.M. Rush, “OpenNMT: Open-Source Toolkit for Neural Machine Translation,” arXiv prePrint arXiv:1701.02810, 2017.

[20] D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, et al. “Deep Speech 2: End-to-End Speech Recognition in English and Mandarin,” arXiv prePrint arXiv:1512.02595, 2015.

[21] Y. Deng, A. Kanervisto, J. Ling, and A. M. Rush, “Image-to-Markup Generation with Coarse-to-Fine Attention,” arXiv prePrint arXiv:1609.04938, 2016.

[22] W. Chan, N. Jaitly, Q. V. Le, and O. Vinyals, “Listen, Attend and Spell,” arXiv prePrint arXiv:1508.01211, 2015.

[23] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention Is All You Need,” arXiv prePrint arXiv:1706.03762, 2017.

[24] D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate,” arXiv prePrint arXiv:1409.0473, 2014.




DOI: https://doi.org/10.34238/tnu-jst.6818

Refbacks

  • There are currently no refbacks.
TNU Journal of Science and Technology
Rooms 408, 409 - Administration Building - Thai Nguyen University
Tan Thinh Ward - Thai Nguyen City
Phone: (+84) 208 3840 288 - E-mail: jst@tnu.edu.vn
Based on Open Journal Systems
©2018 All Rights Reserved