MỘT MÔ HÌNH CHÚ Ý NHẸ CHO BÀI TOÁN TÁCH TIẾNG NÓI
Thông tin bài báo
Ngày nhận bài: 01/10/25                Ngày hoàn thiện: 30/12/25                Ngày đăng: 31/12/25Tóm tắt
Từ khóa
Toàn văn:
PDFTài liệu tham khảo
[1] A. Mehrish, N. Majumder, R. Bharadwaj, R. Mihalcea, and S. Poria, “A review of deep learning techniques for speech processing,” Information Fusion, vol. 99, 2023, Art. no. 101869.
[2] Y. Luo and N. Mesgarani, “Conv-TasNet: Surpassing ideal time–frequency magnitude masking for speech separation,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 8, pp. 1256–1266, 2019.
[3] Y. Luo, Z. Chen, and T. Yoshioka, “Dual-path RNN: Efficient long sequence modeling for time-domain single-channel speech separation,” in Proc. ICASSP, 2020, pp. 46–50.
[4] A. Gulati et al., “Conformer: Convolution-augmented Transformer for speech recognition,” in Proc. Interspeech, 2020, pp. 5036-5040.
[5] C. Subakan, M. Ravanelli, S. Cornell, M. Bronzi, and J. Zhong, “Attention is all you need in speech separation,” in Proc. ICASSP, 2021, pp. 21–25.
[6] K. Tan, Y. Zhang, and D. Wang, “Deep learning based real-time speech separation for mobile devices,” IEEE Signal Process. Lett., vol. 28, pp. 1–5, 2021.
[7] Y. Xiang and D. Wang, “Lightweight speech separation with depthwise separable convolutions,” in Proc. ICASSP, 2022, pp. 126–130.
[8] H. Li, L. Chen, and Z. Huang, “Resource-efficient speech enhancement via mobile architectures,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 30, pp. 1234–1247, 2022.
[9] S. Zhang et al., “Designing efficient neural networks for on-device speech separation,” Neural Networks, vol. 157, pp. 98–109, 2023.
[10] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” in Proc. ECCV, 2018, pp. 3–19.
[11] Z. Yao, W. Pei, F. Chen, G. Lu, and D. Zhang, “Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in High-Order Latent Domain,” arXiv preprint, 2021.
[12] J. Wang, “An Efficient Speech Separation Network Based on Recurrent Fusion Dilated Convolution and Channel Attention,” Interspeech, 2023, pp. 3699-3703.
[13] H. Ma, “A novel end-to-end deep separation network based on the attention mechanism,” IET Signal Processing, vol. 17, no. 2, pp. 1-10, 2023.
[14] K. Wang, H. Zhou, J. Cai, and W. Li, “Time-domain adaptive attention network for single-channel speech separation,” EURASIP J. Audio, Speech, Music Process., vol.21, pp. 1-15, 2023.
[15] A. Défossez, G. Synnaeve, and Y. Adi, “Real time speech enhancement in the waveform domain,” Proc. Interspeech, Sep. 2020, pp. 3291–3295, doi: 10.21437/Interspeech.2020-2309.
[16] Y. Luo and N. Mesgarani, “TasNet: Time-domain audio separation network for real-time, single-channel speech separation,” in Proc. ICASSP, Apr. 2018, pp. 696–700.
[17] S. Wisdom, E. Tzinis, H. Erdogan, R. Weiss, K. Wilson, and J. R. Hershey, “Unsupervised sound separation using mixture invariant training,” Proc. NeurIPS, 2020, pp. 1-12.
[18] H. M. Tan, D.-Q. Vu, and J.-C. Wang, "Selinet: a lightweight model for single channel speech separation," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1-5.
[19] S. Chen, Y. Wu, Z. Chen, J. Wu, J. Li, T. Yoshioka, C. Wang, S. Liu, and M. Zhou, "Continuous speech separation with conformer," in International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2021, pp. 5749-5753.
DOI: https://doi.org/10.34238/tnu-jst.13724
Các bài báo tham chiếu
- Hiện tại không có bài báo tham chiếu





