Household Appliance Identification Using Vision Transformers and Multimodal Data Fusion

Mohammed Ayub, El Sayed M. El-Alfy*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Accurately identifying household appliances from power consumption data collected via smart meters opens up new possibilities for improving energy management in smart homes and providing substantial benefits to both utilities and consumers. It enables real-time optimization of energy use, offers personalized savings recommendations, enhances demand forecasting, provides detailed appliance load profiling, and supports the promotion of energy-efficient technologies. While low-resolution consumption data are preferred due to the limited processing capabilities of residential smart meters, they lack the granularity needed to capture detailed consumption patterns, resulting in performance degradation in many cases. This paper explores a novel approach based on a revised version of a vision transformer for household appliance identification using low-resolution and low-volume data. To maintain superior algorithmic performance, we first fuse different time-series imaging to augment and compensate for features that might be missed by a single technique, enabling efficient and robust feature representation. Next, real-time data augmentation and pretrained weights from Hugging Face transformers are leveraged and fine-tuned through transfer learning to enhance model performance with limited data, accelerate the training process, and improve model generalization. We compare three variants of our proposed solution: (i) multi-class classification problem, (ii) multi-label classification problem, and (iii) multi-target appliance-specific classification problem. Extensive experiments on four public datasets (ENERTALK, UK-DALE, iWAE, and REFIT) demonstrate that our proposed multimodal data fusion vision transformer outperforms non-fusion baseline models. It can achieve near-perfect results across multi-class, multi-label, and multi-target tasks, with overall F1 scores above 97% and perfect scores for several appliances. Several cross-house and cross-dataset experiments are also conducted to assess the generalization capability of the models on data from previously unseen households and datasets. Additionally, an ablation study demonstrates the model's scalability, as well as its computational and energy efficiency under different appliance combinations.

Original languageEnglish
Pages (from-to)2774-2792
Number of pages19
JournalIEEE Transactions on Consumer Electronics
Volume71
Issue number2
DOIs
StatePublished - 2025

Bibliographical note

Publisher Copyright:
© IEEE. 2025 IEEE.

Keywords

  • Appliance identification
  • Gramian angular difference field
  • multi-target appliance classification
  • multilabel classification
  • time series
  • vision transformers

ASJC Scopus subject areas

  • Media Technology
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Household Appliance Identification Using Vision Transformers and Multimodal Data Fusion'. Together they form a unique fingerprint.

Cite this