A neural network method for spamassasin rules generation

  • Hà Thanh Nguyễn Hanoi Department of Information and Communication
  • Quân Đình Đặng Khoa Công nghệ thông tin – Trường Đại học Hà Nội
  • Anh Quang Trần Học viện Công nghệ Bưu chính Viễn thông
Keywords: neural network, rules generation, spam filtering, SpamAssassin


SpamAssassin has been widely used for spam filtering on e-mail servers for its recognized real-time performance and its ease of customization. Unfortunately, SpamAssassin does not come with default support for languages other than English. Although its default rule set for English spam detection is frequently updated, users usually have to train their own set of rules to match the signature of their particular e-mail traffic. There have been many proposed methods for the generation of SpamAssassin rules in many languages including but not limited to English [6], [9], [16], Chinese [11], Thai [17] and Vietnamese [12]. The general drawback of these methods is the use of hand-engineered feature selection, which is a time-consuming process because it involves a lot of data observation and analysis. In this paper, we propose a multilayer neural network model for generating SpamAssassin rules which selects good features and optimize rule weights at the same time. The weighted rule set obtained from training this neural network can be applied directly in SpamAssassin. The experiments showed that our network is fast to train and the resulted rule set has comparable detection rates to previous rule generation methods.