Evaluation of Word Embedding Techniques for the Vietnamese SMS Spam Detection Model
Keywords:
Vietnamese spam, SMS Spam, deep learning, CNN, word embeddingAbstract
The escalating issue of SMS spam in Vietnamese text messages has prompted the adoption of machine learning and deep learning models for effective detection. This paper investigates the impact of word embedding techniques on enhancing SMS spam detection models. Traditional statistical methods (BoW, TF-IDF) are compared with advanced techniques (Word2Vec, fastText, GloVe, PhoBERT) using a proprietary dataset. The evaluation focuses on accuracy, precision, recall, and F1 Score. PhoBERT integrated with CNN model showcased the highest accuracy of 0.968 and a remarkable F1 score of 0.941. The study sheds light on the role of word embeddings in constructing robust spam detection models, offering valuable guidance for model selection. The methodology, comparative analysis, and future directions are presented.
Published
Versions
- 2024-05-04 (2)
- 2024-05-04 (1)