Abstract
Phishing scams via SMS have become a common phenomenon due to the widespread use of smartphones and the availability of mobile Internet technologies. Identifying a phishing SMS via analyzing unstructured short texts is a challenging issue in the domain of AI-driven cybersecurity. Machine learning-based techniques integrated with natural language processing have massive potentials to identify differentiating patterns between phishing and legitimate SMS. In this paper, we have experimented with several state-of-the-art machine learning algorithms on a benchmark dataset. Also, NLP-based feature extraction and feature selection steps are incorporated to build an automated phishing detection strategy. Support vector machine classifier when applied after feature extraction and feature selection has outperformed the tenfold cross-validation score of 98.27%, F1-score of 99.08% for legitimate SMS, and accuracy of 98.39%. The performance of the tested methods has been evaluated through popular evaluation metrics on a benchmark dataset.
| Original language | English |
|---|---|
| Title of host publication | Lecture Notes on Data Engineering and Communications Technologies |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 677-689 |
| Number of pages | 13 |
| DOIs | |
| State | Published - 2022 |
| Externally published | Yes |
Publication series
| Name | Lecture Notes on Data Engineering and Communications Technologies |
|---|---|
| Volume | 95 |
| ISSN (Print) | 2367-4512 |
| ISSN (Electronic) | 2367-4520 |
Bibliographical note
Publisher Copyright:© 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
Keywords
- ANOVA test
- Machine learning
- Natural language processing
- Smishing
- TF-IDF
ASJC Scopus subject areas
- Information Systems
- Media Technology
- Computer Science Applications
- Computer Networks and Communications
- Electrical and Electronic Engineering