Abstract
This article proposes a novel methodology to detect malicious uniform resource locators (URLs) using simulated expert (SE) and knowledge-base system (KBS). The proposed study not only efficiently detects known malicious URLs but also adapts countermeasure against the newly generated malicious URLs. Moreover, this article also explored which lexical features are contributing more in final decision using a factor analysis method, and thus help in avoiding the involvement of human experts. Furthermore, we apply the following state-of-the-art machine learning (ML) algorithms, i.e., naïve Bayes (NB), decision tree (DT), gradient boosted trees (GBT), generalized linear model (GLM), logistic regression (LR), deep learning (DL), and random rest (RF), and evaluate the performance of these algorithms on a large-scale real data set of data-driven Web applications. The experimental results clearly demonstrate the efficiency of NB in the proposed model as NB outperforms when compared to the rest of the aforementioned algorithms in terms of average minimum execution time (i.e., 3 s) and is able to accurately classify the 107 586 URLs with 0.2% error rate and 99.8% accuracy rate.
Original language | English |
---|---|
Article number | 8908778 |
Pages (from-to) | 4497-4504 |
Number of pages | 8 |
Journal | IEEE Internet of Things Journal |
Volume | 7 |
Issue number | 5 |
DOIs | |
State | Published - May 2020 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2014 IEEE.
Keywords
- Feature extraction
- URL classification
- malicious URLs
- naïve Bayes (NB)
- simulated experts (SEs)
ASJC Scopus subject areas
- Signal Processing
- Information Systems
- Hardware and Architecture
- Computer Science Applications
- Computer Networks and Communications