DEEP LEARNING-BASED AUTOMATIC CONTROL SYSTEM USING AUDIO SIGNAL ANALYSIS FOR VIETNAMESE RECOGNITION
About this article
Received: 20/03/25                Revised: 09/07/25                Published: 09/07/25Abstract
Keywords
Full Text:
PDF (Tiếng Việt)References
[1] M. Chiang and T. Zhang, "Fog and IoT: An overview of research opportunities," IEEE Internet of Things Journal, vol. 3, no. 6, pp. 854-864, 2016.
[2] A. V. Dastjerdi and R. Buyya, "Fog computing: Helping the Internet of Things realize its potential," Computer, vol. 49, no. 8, pp. 112-116, 2016.
[3] M. Prasad et al., "Voice-controlled autonomous navigation system for mobile robots in dynamic environments," IEEE Transactions on Robotics and Automation, vol. 39, no. 4, pp. 1852-1867, 2023.
[4] A. Khan and R. Johnson, "Efficient voice control systems for IoT devices and smart homes: A comprehensive review," Internet of Things Journal, vol. 11, no. 2, pp. 345-362, 2024.
[5] L. Zhang et al., "Challenges and solutions for on-device audio processing in resource-constrained edge devices," IEEE Transactions on Edge Computing, vol. 8, no. 3, pp. 512-528, 2023.
[6] Y. Chen and P. Smith, "Lightweight deep learning architectures for real-time speech recognition on edge devices," Neural Computing and Applications, vol. 36, no. 1, pp. 78-93, 2024.
[7] Q. Vu et al., "Vietnamese Automatic Speech Recognition: the FLaVoR Approach," in Proc. International Symposium on Chinese Spoken Language Processing, Singapore, December 2006, vol. 4274, pp. 464–474.
[8] T. Le, H. Nguyen, and Q. Vu, "Progress in Transcription of Vietnamese Broadcast News," in Proc. International Conference on Communications and Electronics (ICCE'06), October 2006, pp. 300–304.
[9] T.-S. Phan, T.-C. Duong, A.-T. Dinh, T.-T. Vu, and C.-M. Luong, "Improvement of naturalness for an HMM-based Vietnamese speech synthesis using the prosodic information," Proceedings of the 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF), 2013, pp. 276–281.
[10] Q. B. Nguyen, T. T. Vu, and C. M. Luong, "Improving acoustic model for Vietnamese large vocabulary continuous speech recognition system using deep bottleneck features," Proceedings of the Sixth International Conference on Knowledge and Systems Engineering (KSE 2014), 2015, pp. 49–60.
[11] P. Hung, T. Minh, L. Hoang, and M. Phan, "Vietnamese speech command recognition using recurrent neural networks," International Journal of Advanced Computer Science and Applications, vol. 10, no. 1, 2019, doi: 10.14569/IJACSA.2019.0100728.
[12] T.-T. Le, L. T. Nguyen, and D. Q. Nguyen, "PhoWhisper: Automatic speech recognition for Vietnamese," arXiv preprint arXiv:2406.02555, 2024.
[13] F. J. Harris, "On the use of windows for harmonic analysis with the discrete Fourier transform," Proceedings of the IEEE, vol. 66, no. 1, pp. 51-83, Jan. 1978.
[14] S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357-366, 1980.
[15] M. Sahidullah and G. Saha, "Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition," Speech Communication, vol. 54, no. 4, pp. 543-565, 2012.
[16] M. Xue and C. Zhu, "The Socket Programming and Software Design for Communication Based on Client/Server," 2009 Pacific-Asia Conference on Circuits, Communications and Systems, Chengdu, China, 2009, pp. 775-777, doi: 10.1109/PACCS.2009.89.
DOI: https://doi.org/10.34238/tnu-jst.12357
Refbacks
- There are currently no refbacks.





