This is an outdated version published on 2024-05-04. Read the most recent version.

Evaluation of Word Embedding Techniques for the Vietnamese SMS Spam Detection Model

Authors

Vu Minh Tuan Hanoi University
Tran Quang Anh
Do Thuy Duong

Keywords:

Vietnamese spam, SMS Spam, deep learning, CNN, word embedding

Abstract

The escalating issue of SMS spam in Vietnamese text messages has prompted the adoption of machine learning and deep learning models for effective detection. This paper investigates the impact of word embedding techniques on enhancing SMS spam detection models. Traditional statistical methods (BoW, TF-IDF) are compared with advanced techniques (Word2Vec, fastText, GloVe, PhoBERT) using a proprietary dataset. The evaluation focuses on accuracy, precision, recall, and F1 Score. PhoBERT integrated with CNN model showcased the highest accuracy of 0.968 and a remarkable F1 score of 0.941. The study sheds light on the role of word embeddings in constructing robust spam detection models, offering valuable guidance for model selection. The methodology, comparative analysis, and future directions are presented.

Published

2024-05-04

Versions

2024-05-04 (2)
2024-05-04 (1)

Issue

Vol. 1 No. 3 (2023): Journal of Science and Technology on Information and Communications

Section

Computer Science

Evaluation of Word Embedding Techniques for the Vietnamese SMS Spam Detection Model

Authors

Keywords:

Abstract

Published

Versions

Issue

Section

Most read articles by the same author(s)

Information

Make a Submission

Developed By