A data clustering algorithm based on single Hidden Markov Model

Md Rafiul Hassan*, Baikunth Nath, Michael Kirley

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

10 Scopus citations

Abstract

The ability to cluster data into different groups based on a particular similarity measure has a wide appeal in many domains, including: data mining, image classification, speech recognition, fraud detection and in network traffic anomaly detection. Typically, the clustering algorithm partitions a dataset into a fixed number of clusters supplied by the user. In this paper, we propose a single Hidden Markov Model (HMM) based clustering method, which identifies a suitable number of clusters in a given dataset without using prior knowledge about the number of clusters. Initially, the dataset is partitioned into windows of fixed size based on the HMM log likelihood values. This provides a framework for identifying the most appropriate number of clusters (windows of varying sizes). After determining the number of clusters, the data values are then labeled and allocated to clusters. The algorithm is tested using a number of benchmark datasets. The proposed algorithm for both small and large datasets (KDD 1999 Intrusion Detection dataset) performed significantly better compared to other commonly used clustering algorithms.

Original languageEnglish
Pages57-66
Number of pages10
StatePublished - 2006
Externally publishedYes

Keywords

  • Fuzzy c-means
  • HMM
  • SOM
  • Unsupervised clustering

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Information Systems

Fingerprint

Dive into the research topics of 'A data clustering algorithm based on single Hidden Markov Model'. Together they form a unique fingerprint.

Cite this