3D CHARACTER EXPRESSION ANIMATION  ACCORDING TO VIETNAMESE SENTENCE SEMANTICS

Đỗ Thị Chi; Lê Sơn Thái

doi:10.34238/tnu-jst.6361

3D CHARACTER EXPRESSION ANIMATION ACCORDING TO VIETNAMESE SENTENCE SEMANTICS

About this article

Received: 13/08/22 Revised: 07/10/22 Published: 07/10/22

Authors

1. Do Thi Chi , TNU - University of Information and Communication Technology
2. Le Son Thai, TNU - University of Information and Communication Technology

Abstract

Facial expressions are the main means of communicating social information between people, being a form of nonverbal communication. In a virtual reality program or game, a compelling 3D character needs to be able to act and express emotions clearly and coherently. Animation studies show that character need to represent at least six basic emotions: happy, sad, fear, disgust, anger, surprise. However, generating expression animations for virtual characters is time-consuming and requires a lot of creativity. The main objective of the article is to generate expression animations combined with lip-synchronization of 3D characters according to the semantics of Vietnamese sentences. Our method is based on the blendshape weights of the 3D face model. The input text after emotion prediction will be passed to lip sync and emotion generator to perform 3D face animation. Experimental results with 200 Vietnamese sentences are automatically classified according to six different emotions. Then, conduct a survey that predicts the emotion shown in the video. Survey participants were asked to recognize the emotions of 3D virtual faces according to each sentence of input text. Survey results show that anger is the most recognizable emotion, happiness and excitement are easily confused.

Keywords

Facial expression; Emotion; Lipsync; Animation; Virtual reality

Full Text:

PDF (Tiếng Việt)

References

[1] P. Ekman, “Are there basic emotions?” Psychological Review, vol. 99, no. 3, pp. 550-553, 1992.

[2] A. S. Cowen and D. Keltner, “Self – report captures 27 distinct categories of emotion bridged by continuous gradients,” Psychological and Cognitive sciences, vol. 114, no. 38, pp. E7900-E7909, 2017.

[3] T. Gungor and K. Celik, “A comprehensive analysis of using semantic information in text categorization,” in 2013 IEEE INISTA, 2013, pp. 1–5.

[4] C. Goddard, Semantic analysis: A practical introduction. Oxford University Press, 1998.

[5] ParallelDots, “Emotion analysis API,” 2020. [Online]. Available: https://www.paralleldots.com/ emotion-analysis. [Accessed March 30, 2021].

[6] C. Chen, C. Crivelli, O. G. B. Garrod, P. G. Schyns, J. -M. Fernandez-Dols, and R. E. Jack, “Distinct facial expressions represent pain and pleasure across cultures,” Proceedings of the National Academy of Sciences, vol. 115, no. 43, pp. E10013–E10021, 2018.

[7] H. Yu, O. G. B. Garrod, and P. G. Schyns, “Perception-driven facial expression synthesis,” Computers & Graphics, vol. 36, no. 3, pp. 152–162, 2012.

[8] S. Dahmani, V. Colotte, V. Girard, and S. Ouni, “Conditional variational auto-encoder for text-driven expressive audio visual speech synthesis,” in INTERSPEECH 2019-20th Annual Conference of the International Speech Communication Association, Graz, Austria, 2019.

[9] T. Karras, T. Aila, S. Laine, A. Herva, and J. Lehtinen, “Audio-driven facial animation by joint end-to-end learning of pose and emotion,” ACM Transactions on Graphics (TOG), vol. 36, no. 4, pp. 1– 12, 2017.

[10] H. Tang, Y. Fu, J. Tu, T. S. Huang, and M. Hasegawa-Johnson, “Eava: a 3d emotive audio-visual avatar,” in 2008 IEEE Workshop on Applications of Computer Vision, 2008, pp. 1–6.

[11] C. Chen, L. B. Hensel, Y. Duan, R. A. A. Ince, O. G. B. Garrod, J. Beskow, R. E. Jack, and P. G. Schyns, “Equipping social robots with culturally sensitive facial expressions of emotion using data-driven methods,” in 14th IEEE International Conference on Automatic Face & Gesture Recognition, 2019, pp. 1–8.

[12] N. Tsapatsoulis, A. Raouzaiou, S. D. Kollias, R. I. D. Cowie, and E. Douglas-Cowie, Emotion recognition and synthesis based on mpeg-4 faps, John Wiley and Sons, 2002.

[13] M. Liu, Y. Duan, R. A. A. Ince, C. Chen, O. G. B. Garrod, P. G. Schyns, and R. E. Jack, “Building a generative space of facial expressions of emotions using psychological data driven methods,” in Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, 2020, pp. 1–3.

[14] R. E. Jack and P. G. Schyns, “Toward a social psychophysics of face communication,” Annual Review of Psychology, vol. 68, pp. 269–297, 2017.

[15] A. A. Jr and J. Lovell, “Stimulus features in signal detection,” The Journal of the Acoustical Society of America, vol. 49, no. 6B, pp. 1751–1756, 1971.

[16] P. Ekman, “Measuring facial movement,” Environmental Psychology and Nonverbal Behavior, vol. 1 pp. 56–75, 1976.

[17] X. Li, Z. Wu, H. M. Meng, J. Jia, X. Lou, and L. Cai, “Expressive speech driven talking avatar synthesis with dblstm using limited amount of emotional bimodal data,” in Interspeech, San Francisco, USA, September 8–12, 2016, pp. 1477–1481.

DOI: https://doi.org/10.34238/tnu-jst.6361

Refbacks

There are currently no refbacks.



Remember me