Abstract
The ability to cluster data into different groups based on a particular similarity measure has a wide appeal in many domains, including: data mining, image classification, speech recognition, fraud detection and in network traffic anomaly detection. Typically, the clustering algorithm partitions a dataset into a fixed number of clusters supplied by the user. In this paper, we propose a single Hidden Markov Model (HMM) based clustering method, which identifies a suitable number of clusters in a given dataset without using prior knowledge about the number of clusters. Initially, the dataset is partitioned into windows of fixed size based on the HMM log likelihood values. This provides a framework for identifying the most appropriate number of clusters (windows of varying sizes). After determining the number of clusters, the data values are then labeled and allocated to clusters. The algorithm is tested using a number of benchmark datasets. The proposed algorithm for both small and large datasets (KDD 1999 Intrusion Detection dataset) performed significantly better compared to other commonly used clustering algorithms.
| Original language | English |
|---|---|
| Pages | 57-66 |
| Number of pages | 10 |
| State | Published - 2006 |
| Externally published | Yes |
Keywords
- Fuzzy c-means
- HMM
- SOM
- Unsupervised clustering
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Information Systems