Evaluation of Word Embedding Techniques for the Vietnamese SMS Spam Detection Model

Vu Minh Tuan; Tran Quang Anh; Do Thuy Duong

Authors

Vu Minh Tuan Hanoi University
Tran Quang Anh
Do Thuy Duong

Keywords:

Vietnamese spam, SMS Spam, deep learning, CNN, word embedding

Abstract

The escalating issue of SMS spam in Vietnamese text messages has prompted the adoption of machine learning and deep learning models for effective detection. This paper investigates the impact of word embedding techniques on enhancing SMS spam detection models. Traditional statistical methods (BoW, TF-IDF) are compared with advanced techniques (Word2Vec, fastText, GloVe, PhoBERT) using a proprietary dataset. The evaluation focuses on accuracy, precision, recall, and F1 Score. PhoBERT integrated with CNN model showcased the highest accuracy of 0.968 and a remarkable F1 score of 0.941. The study sheds light on the role of word embeddings in constructing robust spam detection models, offering valuable guidance for model selection. The methodology, comparative analysis, and future directions are presented.

Evaluation of Word Embedding Techniques for the Vietnamese SMS Spam Detection Model

Authors

Keywords:

Abstract

Downloads

Published

Versions

Issue

Section

Most read articles by the same author(s)

Information

Make a Submission

Developed By