GMDM: A generalized multi-dimensional distribution overlap metric for data and model quality evaluation

  • Seongyong Park
  • , Mohammad Sohail Ibrahim
  • , Abdul Wahab
  • , Shujaat Khan*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

In this article, we design and analyze a generalized multi-dimensional distribution overlap metric (GMDM) as a generic tool for quantifying similarity or difference between two multivariate distributions for the evaluation of data and model quality. Our experiments on different real-world datasets substantiate that the proposed metric is an apposite intuitive alternative for quantification of the performance of training models and the consistency of multiple datasets. The proposed metric is suitable, for example, for image quality assessment, evaluation of unsupervised training models trained using unpaired data, image denoising models, feature embedding in protein sequence classification, and the estimation of class-specific consistency in datasets of variable dimensions such as the MNIST, ORL, and CIFAR. Furthermore, it is substantiated that the proposed metric can serve as a batch-wise consistency score to evaluate the reproducibility of the surface-enhanced Raman spectroscopy (SERS) based molecule identification model and is also applied to the domain adaptation problem in the anomaly detection task. Our results on diverse applications confirm the generalization of the proposed method and indicate a good agreement of the GMDM with conventional data quality and performance metrics.

Original languageEnglish
Article number103930
JournalDigital Signal Processing: A Review Journal
Volume134
DOIs
StatePublished - 15 Apr 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2023 Elsevier Inc.

Keywords

  • Distribution overlap
  • Domain adaptation
  • Generalized contrast-to-noise ratio
  • Image denoising
  • Style transfer
  • Unsupervised learning

ASJC Scopus subject areas

  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Statistics, Probability and Uncertainty
  • Computational Theory and Mathematics
  • Artificial Intelligence
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'GMDM: A generalized multi-dimensional distribution overlap metric for data and model quality evaluation'. Together they form a unique fingerprint.

Cite this