Abstract
With the increasing popularity of electronic mail, several persons and companies have found it an easy way to quickly disseminate unsolicited messages to a large number of users at very low costs for the senders. Consequently, unsolicited or spam e-mails have dramatically become a major threat that can negatively impact the usability of the electronic mail as a reliable communication means. Besides wasting considerable time and money for business users and network administrators, spam consumes network bandwidth and server storage space, slows down e-mail servers, and provides a medium to distribute harmful and/or offensive content. Hence, it has become an important and indispensable aspect of any recent e-mail system to incorporate a spam filtering subsystem. In this chapter, we present an overview of the spam filtering problem and survey the state-of-the-art of the proposed and deployed machine learning based methods. We begin with a brief review of potential spam threats for network users and resources, and some market analysis indicators of the spam growth rate. After that, we formally describe the machine learning spam filtering problem and discuss various approaches for representing e-mail messages and selecting relevant features. Then, we describe some common metrics and benchmark corpora for evaluating and comparing the performance of different learning methods for spam filtering. Next, we discuss various learning algorithms that have been applied to this problem and survey the related work. Finally, we present a case study to compare the performance of a number of these learning methods on one of the publicly available datasets.
| Original language | English |
|---|---|
| Title of host publication | Computer Systems, Support and Technology |
| Publisher | Nova Science Publishers, Inc. |
| Pages | 175-217 |
| Number of pages | 43 |
| ISBN (Print) | 9781611227598 |
| State | Published - 2011 |
Keywords
- Bayesian filter
- Boosting
- Classification
- Machine learning
- Memory-based learning
- Neural networks
- Spam filtering
- Support vector machines
- Text categorization
- Unsolicited Commercial E-mail
ASJC Scopus subject areas
- General Computer Science