Efficient fault diagnosis using incremental alarm correlation and active investigation for internet and overlay networks

Yongning Tang*, Ehab Al-Shaer, Raouf Boutaba

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

32 Scopus citations

Abstract

Fault localization is the core element in fault management. Symptom-fault map is commonly used to describe the sympton-Fault causality in fault reasoning. For Internet service networks, a well-designed monitoring system can effectively correlate the observable symptoms (i.e., alarms) with the critical network faults (e.g., link failure). However, the lost and spurious symptoms can significantly degrade the performance and accuracy of a passive fault localization system. For overlay networks, due to limited underlying network accessibility, as well as the overlay scalability and dynamics, it is impractical to build a static overlay Symptom-Fault map. In this paper, we firstly propose a novel Active Integrated fault Reasoning (AIR) framework to incrementally incorporate active investigation actions into the passive fault reasoning process based on an extended Symptom-Fault-Action (SFA) model. Secondly, we propose an Overlay Network Profile (ONP) to facilitate the dynamic creation of an Overlay Symptom-Fault-Action (called O-SFA) model, such that the AIR framework can be applied seamlessly to overlay networks (called O-AIR). As a result, the corresponding fault reasoning and action selection algorithms are elaborated. Extensive simulations and Internet experiments show the AIR and O-AIR can significantly improve both accuracy and performance in the fault reasoning for Internet and Overlay service networks, especially when the ratio of the lost and spurious is high.

Original languageEnglish
Pages (from-to)36-49
Number of pages14
JournalIEEE Transactions on Network and Service Management
Volume5
Issue number1
DOIs
StatePublished - Mar 2008
Externally publishedYes

Keywords

  • Fault localization
  • Fault reasoning
  • Overlay networks
  • Symptom-fault map

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Efficient fault diagnosis using incremental alarm correlation and active investigation for internet and overlay networks'. Together they form a unique fingerprint.

Cite this