COMPARISON OF MACHINE LEARNING ALGORITHMS FOR SENTIMENT ANALYSIS OF VIETNAMESE YOUTUBE SUBTITLES | Tú | TNU Journal of Science and Technology

COMPARISON OF MACHINE LEARNING ALGORITHMS FOR SENTIMENT ANALYSIS OF VIETNAMESE YOUTUBE SUBTITLES

About this article

Received: 20/02/24                Revised: 23/05/24                Published: 24/05/24

Authors

1. Nguyen Trong Tu Email to author, Trường Đại học Kỹ thuật Lê Quý Đôn
2. Nguyen Trung Tin, Le Quy Don Technical University

Abstract


Currently, YouTube has become one of the most significant online platforms, with billions of hours of video uploaded every day, attracting a vast user base. Recently, foreign reactionary forces and extremist organizations have exploited YouTube to disseminate videos undermining the Party, the State, and the Vietnamese military. This study focuses on analyzing Vietnamese subtitles collected from YouTube. By using machine learning algorithms, it conducts sentiment analysis and categorizes the subtitles of videos. This research provides a profound insight into the emotions and perspectives of the online community regarding content on YouTube, particularly those related to politics and society. The results of the study among four machine learning algorithms include Naive Bayes, Random Forest, Support Vector Machine, and Logistic Regression. Among them, the Random Forest algorithm has achieved the highest accuracy rate of 81%, surpassing the other three algorithms in analyzing the sentiments of subtitles from YouTube videos with negative content.

Keywords


Machine learning; YouTube subtitles; Sentiment analysis; Subtitle classification; Algorithm comparison

References


[1] M. Munezero, C. S. Montero, E. Sutinen, and J. Pajunen, “Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text,” IEEE Trans. Affect. Comput., vol. 5, no. 2, pp. 101–111, 2014.

[2] B. Pang, L. Lee, and S. Vaithyanathan, “Opinion Mining and Sentiment Analysis” IEEE Trans. Knowl. Data Eng., vol. 20, no. 6, pp. 866–879, June 2008, doi: 10.1109/TKDE.2008.90.

[3] X. Song, X. Liang, and Y. Ma, “A Sentiment Analysis Approach to Predict Stock Market Trends,” 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 2018, pp. 3596–3602, doi: 10.1109/BigData.2018.8621985.

[4] B. Pang and L. Lee, “Sentiment Analysis and Opinion Mining: A Survey,” IEEE Trans. Knowl. Data Eng., vol. 28, no. 9, pp. 1299–1323, Sept. 2016, doi: 10.1109/TKDE.2015.2476522

[5] P. Gonçalves, M. Araújo, F. Benevenuto, and M. Cha, “Comparing and combining sentiment analysis methods” in Proc. first ACM conference on Online social networks, New York, USA, 2013, pp. 27–38, doi: 10.1145/2512938.2512951.

[6] W. Medhat, et al., Sentiment analysis algorithms and applications: A survey, Elsevier, 2020.

[7] W. Y. Chong, et al., Natural Language Processing for Sentiment Analysis, IEEE, 2019.

[8] H. Bhuiyan, K. J. Oh, M. K. Hong, and G. S. Jo, “An unsupervised approach for identifying the Infobox template of wikipedia article,” in 18th International Conference on Computational Science and Engineering (CSE), 2015, IEEE, pp. 334-338.

[9] R. Novendri et al., “Sentiment Analysis of YouTube Movie Trailer Comments Using Naïve Bayes,” Bulletin of Computer Science and Electrical Engineering, 2020.

[10] W. Tafesse, YouTube marketing: how marketers’ video optimization practices influence video views, Internet Research, Emerald Publishing Limited, 2020.

[11] D. Das, et al., Affective Computing and Sentiment Analysis, Springer, 2018.

[12] R. K. Bakshi, N. Kaur, R. Kaur, and G. Kaur, “Opinion mining and sentiment analysis,” in Proc. 3rd International Conference on Computing for Sustainable Global Development (INDIACom), 2016, IEEE, pp. 452–455.

[13] M. Cliche, “BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs,” arXiv:1704.06125 [cs.CL], Apr. 2017, doi: 10.48550/arXiv.1704.06125.

[14] C. Baziotis, N. Pelekis, and C. Doulkeridis, “DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis,” in Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, Aug. 2017, pp. 747–754.

[15] C. M. Vu, T. A. Luong, and L. H. Phuong, “Improving Vietnamese Dependency Parsing Using Distributed Word Representations,” in Proceedings of the 6th International Symposium on Information and Communication Technology (SoICT), Hue, Vietnam, 2015, doi: 10.1145/2833258.2833296.

[16] P. T. Nguyen, L. V. Xuan, T. M. H. Nguyen, V. H. Nguyen, and P. Le-Hong, “Building a large syntactically-annotated corpus of Vietnamese,” in Proceedings of the 3rd Linguistic Annotation Workshop, ACL-IJCNLP, Suntec City, Singapore, 2009, pp. 182–185.

[17] T.-L. Nguyen, V.-H. Nguyen, T.-M.-H. Nguyen, and P. Le-Hong, “Building a treebank for Vietnamese dependency parsing,” in Proceedings of RIVF, IEEE, 2013, pp. 147–151.

[18] VLSP Project, “Resources for Vietnamese,” 2024. [Online]. Available: https://vlsp.hpda.vn/demo/. [Accessed Mar. 6, 2024].




DOI: https://doi.org/10.34238/tnu-jst.9741

Refbacks

  • There are currently no refbacks.
TNU Journal of Science and Technology
Rooms 408, 409 - Administration Building - Thai Nguyen University
Tan Thinh Ward - Thai Nguyen City
Phone: (+84) 208 3840 288 - E-mail: jst@tnu.edu.vn
Based on Open Journal Systems
©2018 All Rights Reserved