Learning methods for spam filtering

El Sayed M. El-Alfy*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

With the increasing popularity of electronic mail, several persons and companies have found it an easy way to quickly disseminate unsolicited messages to a large number of users at very low costs for the senders. Consequently, unsolicited or spam e-mails have dramatically become a major threat that can negatively impact the usability of the electronic mail as a reliable communication means. Besides wasting considerable time and money for business users and network administrators, spam consumes network bandwidth and server storage space, slows down e-mail servers, and provides a medium to distribute harmful and/or offensive content. Hence, it has become an important and indispensable aspect of any recent e-mail system to incorporate a spam filtering subsystem. In this chapter, we present an overview of the spam filtering problem and survey the state-of-the-art of the proposed and deployed machine learning based methods. We begin with a brief review of potential spam threats for network users and resources, and some market analysis indicators of the spam growth rate. After that, we formally describe the machine learning spam filtering problem and discuss various approaches for representing e-mail messages and selecting relevant features. Then, we describe some common metrics and benchmark corpora for evaluating and comparing the performance of different learning methods for spam filtering. Next, we discuss various learning algorithms that have been applied to this problem and survey the related work. Finally, we present a case study to compare the performance of a number of these learning methods on one of the publicly available datasets.

Original languageEnglish
Title of host publicationComputer Systems, Support and Technology
PublisherNova Science Publishers, Inc.
Pages175-217
Number of pages43
ISBN (Print)9781611227598
StatePublished - 2011

Keywords

  • Bayesian filter
  • Boosting
  • Classification
  • Machine learning
  • Memory-based learning
  • Neural networks
  • Spam filtering
  • Support vector machines
  • Text categorization
  • Unsolicited Commercial E-mail

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Learning methods for spam filtering'. Together they form a unique fingerprint.

Cite this