Abstract
The future generations networking technologies such as 5G and 6G will provide tremendous performance, network capacity, quality of service and connectivity. Therefore, the convergence of these with technologies with big data analytics in today's smart ecosystem will provide tremendous opportunities. The existing URL filtering techniques do not do real-time filtering, and lack fault-tolerance and scalability. We have addressed these issues and have developed a real-time, fault-tolerant and scalable machine learning based binary classification model, which handles streams of URL traffic and classifies it into obscene or clean material, in real-time. We have only used the URL based features for classification, and have still achieved a good accuracy of 93% on logistic regression classifier and 88%. Our model can filter 2 million URLs in 55 seconds. The proposed model achieved precision, recall and f1-score values of 0.92, 0.95 and 0.93 respectively.
Original language | English |
---|---|
Article number | 107379 |
Journal | Computers and Electrical Engineering |
Volume | 95 |
DOIs | |
State | Published - Oct 2021 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2021 Elsevier Ltd
Keywords
- Big data analytics
- Logistic regression
- Machine learning
- URL filtering
ASJC Scopus subject areas
- Control and Systems Engineering
- General Computer Science
- Electrical and Electronic Engineering